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Abstract 

Abstract interpretation is a general methodology for systematic development of program 
analyses. An abstract interpretation framework is centered around a parametrized non- 
standard semantics that can be instantiated by various domains to approximate different 
program properties. 

Many abstract interpretation frameworks and analyses for Prolog have been proposed, 
which seek to extract information useful for program optimization. Although motivated by 
practical considerations, notably making Prolog competitive with imperative languages, 
such frameworks fail to capture some of the control structures of existing implementations 
of the language. 

In this paper we propose a novel framework for the abstract interpretation of Prolog 
which handles the depth-first search rule and the cut operator. It relies on the notion 
of substitution sequence to model the result of the execution of a goal. The framework 
consists of (i) a denotational concrete semantics, (ii) a safe abstraction of the concrete 
semantics defined in terms of a class of post-fixpoints, and (iii) a generic abstract in- 
terpretation algorithm. We show that traditional abstract domains of substitutions may 
easily be adapted to the new framework, and provide experimental evidence of the ef- 
fectiveness of our approach. We also show that previous work on determinacy analysis, 
that was not expressible by existing abstract interpretation frameworks, can be seen as 
an instance of our framework. 

The ideas developed in this paper can be applied to other logic languages, notably to 
constraint logic languages, and the theoretical approach should be of general interest for 
the analysis of many non-deterministic programming languages. 



1 Introduction 



Abstract interpretation (Cousot and Cousot, 1977) is a general methodology for 
systematic development of program analyses. It has been applied to various for- 
malisms and paradigms including flow-charts and imperative, functional, logic, and 
constraint programming. 
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Abstract interpretation of Prolog and, more generally, of logic programming was 



initiated by Mcllish (1987) and further developed by numerous researchers, e.g. 



Bruynooghe (1991), Cousot and Cousot (1992a), Jones and S0ndergaard (1987), Le 
Charlier et al. ( 1991 ) , Marriott and S0ndergaard ( |1989b| ) . Many different kinds of 
practical analyses and optimizations have been proposed, a detailed description of 



Jansscns and Bruynooghe, 1992: Kanamori and Horiuchi, 1985; Kicburtz 



which can be found in (Cousot and Cousot, 1992a; Getzinger, 1994). Briefly, mode 
( ICortesi et al., 199lj |6ebray, 1989] ; JPcbray and Warren, 1988|; fcomogyi, 1987|) 
type ( Barbuti and Giacobazzi, 1992 ; Cortcsi et al., 1995 ; Gang and Zhiliang, 1986 



Kluzniak, 1987; Leivant, 1983; Mycroft and O'Kccfc, 1984; Xu and Warren, 1988 



Yardeni and Shapiro, 1991), and aliasing ( Codish et al, 1991 ; Jacob and Langen 



1983 



1989) analyses collect information about the state of variables during the execution 



and are useful to speed up term unification and make memory allocation more 
efficient ( Hcrmericgildo et al., 1992); Warren etaL, 198S ) . Sharing analysis ( Gorsini 



199l| ; |Cortesi and File, 199l| ; [Kluzniak, 1988j ; |Muthukumar and Hcrmcncgildo, 199l[ ) 



is similar to aliasing except that it refers to the sharing of memory structures 
to which program variables are instantiated; it is useful to perform compile-time 



garbage collection (Jensen and Mogensen, 199C; Kluzniak, 1988; Mulkers et 



1990) and automatic parallelization (Cabeza Gras and Hcrmenegildo, 1994; Chang 



et al., 1985; Giacobazzi and Ricci, 199C; Jacob and Langen, 1992). Reference chain 



analysis (Maricn et al., 1989; Van Roy and Despain, 1992) attempts to determine 
an upper bound to the length of the pointer chain for a program variable. Trai- 



ling analysis (Taylor, 1989) aims at detecting variables which do not need to be 



trailed. Liveness analysis (Mulkers, 1991) determines when memory structures can 



be reused and is useful to perform update-in-place. 

All these analyses approximate the set of values (i.e., terms or memory struc- 
tures) to which program variables can be instantiated at some given program point. 
It is thus not surprising that almost all frameworks for the abstract interpretation of 



Prolog, e.g., (Barbuti et al., 1993; Bruynooghe, 1991; Jones and S0ndergaard, 1987 



Marriott, 1993| ; |Marriott and S0ndcrgaard, 1989bj ; |Mcllish, 1987| ; [Nilsson, 1990| ), are 
based on abstractions of sets of substitutions. Such traditional frameworks ignore 
important control features of the language, like the depth-first search strategy and 
the cut operator. The reason is that these control features are difficult to model 
accurately, and yet not strictly necessary for a variable level analysis. However, 
modeling Prolog control features has two main advantages. First, it allows one to 
perform so-called predicate level analyses, like determ inacy ([Giacobazzi and Ricci 



1992; Bahlin, 1991; Ueda, 1987; Van Roy et al, 1987; Van Roy and Despain, 1992) 



and local stack (Marien and Demoen, 1989; Maier, 1991) analyses. These analyses 
arc not captured by traditional abstract interpretation frameworks; they usually 
rely on some ad hoc technique and require special-purpose proofs of correctness, 



e.g., (Debray and Warren, 198E; 3ahlin, 1991), which may be rather involved. They 
are useful to perform optimizations, such as the choice point removal and the simpli- 
fication of environment creation. Second, the analysis of some classes of programs, 
like programs containing multi-directional procedures which use cuts and meta- 
predicates to select among different versions, may be widely improved. This may 
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provide the compiler with more chances to perform important optimizations such 
as dead-code elimination. 

Abstract interpretation of Prolog with control has been investigated by other 
authors. In particular, we know of three main different approaches. The approach 
of Barbuti et al. ( 1993| ) is based on an abstract semantics for logic programs with 
control which is parametric with respect to a "termination theory" . The latter is 
intended to be provided from outside, for instance by applying proofs procedures. 
File and Rossi ( 1993 ) propose an operational and non-compositional abstract inter- 
pretation framework for Prolog with cut consisting of a tabled interpreter to visit 
OLDT abstract trees decorated with information about sure success or failure of 
goals. Finally, Spoto ( 2000| ) define an abstract goal- independent denotational se- 
mantics for Prolog handling control rules and cut. Program denotations are adorned 
with "observability" constraints giving information about divergent computations 
and cut executions. We know of no experimental results validating the effectiveness 
of these approaches. 

In this paper we present a novel abstract interpretation framework for Prolog 
which models the depth-first search rule and the cut operator. It relies on the 
notion of substitution sequence which allows us to collect the solutions to a goal 
together with information such as sure success and failure, the number of solutions, 
and/or termination. The framework that we propose can be applied to perform 
predicate level analyses, such as determinacy, which were not expressible by classical 
frameworks, and can be also used to improve the accuracy of existing analyses. 
Experiments on a sample analysis, namely cardinality analysis, will be discussed. 



1.1 Some Motivating Examples 

In this section we illustrate by means of small examples the functionality of our 
static analyzer and we discuss how it improves on previous abstract interpretation 
frameworks. Experimental results on medium-size programs will be reported later. 

The first two examples show that predicate level properties, such as determinacy, 
which are out of the scope of traditional abstract interpretation frameworks can 
be captured by our analyzer. To the best of our knowledge, does not exist any 
specific analysis which can infer determinacy of all the programs that are discussed 
hereafter. 

Consider first the procedure is_last: 
is_last(X, [X] ) . 

is_last(X, LIT] ) :- is_last(X,T) . 

When given the input pattern is_last (var , ground), where var and ground 
denote the set of all variables and the set of all ground terms respectively, our analy- 
sis returns the abstract sequence (is_last (ground, [ground I ground] ), , 1 ,pt), 
where is_last (ground, [ground I ground] ) is the pattern characterizing the output 
substitutions, and 1 are, respectively, the minimum and the maximum number of 
returned output substitutions, and pt stands for "possible termination". 

Consider now the following two versions of the procedure partition. 
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partition([] ,P, [],[]) . 

partition( [SIT] ,P, [S|Ss] ,Bs) :- S < P, !, partition(T,P,Ss,Bs) . 
partition( [B|T] ,P,Ss, [BlBs] ) :- partition(T,P,Ss,Bs) . 

partition([] ,P, [],[]) . 

partition( [S |T] ,P, [S|Ss] ,Bs) :- leq(S,P), partition(T,P,Ss,Bs) . 
partition( [BIT] ,P,Ss, [BlBs] ) :- gt(B,P), partition(T,P,Ss,Bs) . 
leq(Kl-Vl,K2-V2) :- Kl < K2 . 
gt(Kl-Vl,K2-V2) :- Kl > K2 . 

Note that the second version of the procedure calls arithmetic predicates through 
an auxiliary predicate and is appropriate for a key sort. Given an input pattern 
part it ion (ground, ground, var.var), our analysis returns in both cases the ab- 
stract sequence (part it ion (ground, ground, ground, ground) ,0 , 1 ,pt). Input/out- 
put patterns are used to determine that the first clause and the two others are 
mutually exclusive in both programs, while the cut (in the first version) and the 
abstraction of arithmetic predicates (in the second version) determine the mutual 
exclusion of the second and the third clause. Thus we can infer determinacy of both 
versions of the procedure partition. 

As stated above, we don't know of any static analysis for logic programs which 
can infer determinacy of all these programs. For instance, the analysis developed 



by Debray and Warren ( 1989 ) to detect functional computations of a logic pro- 



gram cannot infer determinacy of the procedure is_last; the determinacy analysis 



proposed by Dawson et al. ( 1993 ), while it can handle the second version of the 



procedure partition, it cannot handle the first version of it since it does not deal 



with the cut; for the same reason, the analysis of Giacobazzi and Ricci (1992) can- 



not treat the first version of the procedure partition; and the cardinality analysis 



defined by Sahlin (1991) cannot handle any of the examples discussed above since 
it ignores predicate arguments. 

The next example shows that the use of abstract sequences can improve on the 
analysis of variable level properties such as modes. 

Consider the procedure compress (L,Lc) , which relates two lists Lc and L such 
that Lc is a compressed version of L. For instance, the compressed version of the list 
[a, b, b, c, c, c] is [a, 1, b, 2, c, 3] . A library can contain the definition 
of a single procedure to handle both compression and decompression as follows. 

compress ( A, B) :- var(A), !, decmp(A,B) . 
compress(A,B) :- cmp(A,B). 

cmp( [],[]). 
cmp([C] , [C,l]) . 

cmp([Cl,C2|T] , [Cl,l,C2,N|Rest]) :- CK>C2, cmp( [C2|T] , [C2,N|Rest] ) . 
cmp([Cl,Cl|T] , [Cl.NllRest]) :- cmp( [CI I T] , [CI ,N I Rest] ) , N1: = N+1. 

decmp( [] , [] ) . 
decmp([C] , [C,l]) . 

decmp([Cl,C2|T] , [CI , 1 , C2 ,N I Rest] ) : -decomp( [C2 I T] , [C2,N|Rest] ) , CK>C2. 
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decmp( [Cl.Cl IT] , [CI, Nl I Rest] ) : -Nl > 1 , N : = N1 - 1 , 

decmp( [CI IT] , [CI, N I Rest] ) . 

Given the input patterns compress (ground, var) and compress (var, ground), 
our analysis returns the abstract sequence (compress (ground, ground) ,0, 1 ,pt) 
for both the inputs. This example illustrates many of the functionalities of our 
system, including input/output patterns, abstraction of arithmetic and meta-pre- 
dicates, and the cut, all of which are necessary to obtain the optimal precision. 
In addition, it shows that taking the cut into account improves the analysis of 
modes. Indeed, a mode analysis ignoring the cut would return the output pattern 
compress(novar, ground) for the input pattern compress (var .ground) , losing the 
groundness information. None of the abstract interpretation algorithms for logic 
programs we know of can handle this example with an optimal result. Moreover, if 
a program only uses the input pattern compress (var , ground) , our analysis detects 
that the second clause of compress is dead code without any extra processing since 
no input/output pattern exists for comp. The second clause, the test var, and the 
cut of the first clause can then be removed by an optimizer. 

Notice that there exist implemented tools for the static analysis of Prolog pro- 



grams, such as PLAI (Muthukumar and Hcrmenegildo, 1992), which can achieve as 



accurate success and dead-code information as our analyzer. However, such tools 
usually integrate several analyses based on different techniques which are not all 
justified by the abstract interpretation framework. The example of the procedure 
compress shows that our analyzer can handle control features of the language within 
the abstract interpretation framework without the need of any extra consideration. 



1.2 Sequence- Based Abstract Interpretation of Prolog 



An abstract interpretation framework ( |Cousot and Cousot, 1992b ) is centered 



around the definition of a non-standard semantics approximating a concrete se- 
mantics of the language. 

Most top-down abstract interpretation frameworks for logic programs, see, for in- 



Lc Charlier and Van Hentenryck, 1994; 


Marriott and S0ndergaard, 1989a 


: Mellish 


1987 




Muthukumar and Hcrmenegildo, 1992; Nilsson, 1990; Warren, 1992; Wins- 



borough, 1992), can be viewed as abstractions of a concrete structural operational 



semantics (Plotkin, 1981). Such a semantics defines the meaning of a program as a 
transition relation described in terms of transition rules of the form (0,o) i — ► 9', 
where the latter expresses the fact that 9' is a possible output from the execution of 
the construct o (i.e., a procedure, a clause, etc.) called with input 6. This structural 
operational semantics can easily be rephrased as a fixpoint semantics mapping any 
input pattern (9, o) to the set of all corresponding outputs 9' . The fixpoint seman- 
tics can then be lifted to a collecting semantics that maps sets of inputs to sets 
of outputs and is defined as the least fixpoint of a set-based transformation. The 
non-standard (or abstract) semantics is identical to the collecting one except that 
it uses abstract values instead of sets and abstract operations instead of operations 
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over sets. Finally, an abstract interpretation algorithm can be derived by instanti- 



ating a generic fixpoint algorithm (Lc Charlier and Van Hentenryck, 1993) to the 
abstract semantics. 

The limitations of traditional top-down frameworks for Prolog stem from the fact 
that structural operational semantics are unable to take the depth-first search rule 
into account. Control operators such as the cut cannot be modeled and are thus 
simply ignored. To overcome these limitations, we propose a concrete semantics of 
Prolog which describes the result of program executions in terms of substitution 
sequences. This allows us to model the depth-first search rule and the cut operator. 
The semantics is defined in the denotational setting to deal with sequences resul- 
ting from the execution of infinite computations. Moreover, it is still compositional 
allowing us to reuse most of the material of our previous works, i.e., the abstract 



domains and the generic algorithm (Le Charlier and Van Hentenryck, 1994) 



However, technical problems arise when applying the abstract interpretation ap- 
proach described above. Let us informally explain the main ideas behind the defi- 
nition of our framework. 

First, we define a concrete semantics as the least fixpoint of a concrete transfor- 
mation TCB mapping every so-called concrete behavior i — > to another concrete 
behavior h— +. The notion of concrete behavior is our denotation choice for a Pro- 
log program: it is a function that maps pairs of the form (0,p) to a substitution 
sequence S, which intuitively represents the sequence of computed answer sub- 
stitutions returned by the query p(xx, ■ ■ ■ ,x n )6. The fixpoint construction of the 
concrete semantics relies on a suitable ordering C defined on sequences. 

Second, a collecting transformation TCD is obtained by lifting the concrete trans- 
formation TCB to sets of substitutions and sets of sequences. The transformation 
TCD is monotonic with respect to set inclusion. However, its least fixpoint does 
not safely approximate the concrete semantics. In fact, the least set with respect 
to inclusion, that is the empty set {}, does not contain the least substitution se- 
quence with respect to C, which is a special sequence denoted by < _L >. The 
problem relies on the fact that an ordering on sets of sequences that "combines" 
both the ordering C on sequences and the ordering C on sets is needed. This is 
an instance of the power domain construction problem ( Bchmidt, 198§| ), which is 



difficult in general. We choose a more pragmatic solution which consists in re- 
stricting to chain-closed sets of sequences, i.e., sets containing the limit of every 
increasing chain, with respect to C, of their elements. We also introduce the no- 
tion of pre- consistent collecting behavior which, roughly speaking, contains a lower 
approximation, with respect to C, of the concrete semantics (the least fixpoint of 
TCB). The transformation TCD maps pre-consistent collecting behaviors to other 
pre-consistent ones. Moreover, assuming that sets of sequences are chain-closed, any 
pre-consistent post- fixpoint, with respect to set inclusion, of TCD safely approxi- 
mates the concrete semantics. These results imply that a safe collecting behavior 
can be constructed by iterating on TCD from any initial pre-consistent collecting 



behavior and by applying some widening techniques (Cousot and Cousot, 1992c) 
in order to reach a post-fixpoint. 
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Third, the abstract semantics is defined exactly as the collecting one except that 
it is parametric with respect to the abstract domains. In fact, we do not explicitly 
distinguish between the collecting and the abstract semantics: in our presentation, 
the collecting transformation TCD is just a particular instance of the (generic) 
abstract transformation TAB. 

Finally, a generic abstract interpretation algorithm is derived from the abstract 
semantics. The algorithm is essentially an instantiation of the universal fixpoint 
algorithm described in (Lc Charlicr and Van Hcntenryck, 1993). 



1.3 Plan of the Paper 

The paper is organized as follows. Section || and Section || describe, respectively, 
our concrete and abstract semantics for pure Prolog augmented with the cut. The 
generic abstract interpretation algorithm is discussed in Section ^[ Section ^| is a 
revised and extended version of (Bracm et al, 1994). It describes an instantiation 
of our abstract interpretation framework to approximate the number of solutions 
to a goal. Experimental results are reported. In Section || we consider related works 
on determinacy analysis. Section [^concludes the paper. 



2 Concrete semantics 

This section describes a concrete semantics for pure Prolog augmented with the cut. 
The concrete semantics is the link between the standard semantics of the language 
and the abstract one. Our concrete semantics is denotational and is based on the 
notion of substitution sequence. Correctness of the concrete semantics with respect 
to Prolog standard semantics, i.e., OLD-resolution, is discussed. Most proofs are 
omitted here; all details can be found in ( Le Charlicr et ai, 1996| ). 



2. 1 Syntax 

The abstract interpretation framework presented in this paper assumes that pro- 
grams are normalized according to the abstract syntax given in Fig. |} The variables 
occurring in a literal are distinct; distinct procedures have distinct names; all clauses 
of a procedure have exactly the same head; if a clause uses m different program 
variables, these variables are x\, . . . , x m . 



2.2 Basic Semantic Domains 

This section presents the basic semantic domains of substitutions. Note that we as- 
sume a preliminary knowledge of logic programming; see, for instance (Apt, 1997; 
Lloyd, 1987[). 



Variables and Terms. We assume the existence of two disjoint and infinite sets 
of variables, denoted by PV and SV . Elements of PV are called program variables 
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X% 




PrograrnVariables (PV) 







Fig. 1. Abstract syntax of normalized programs 

and are denoted by x\, x 2 , . . . , Xi, .... The set PV is totally ordered; xi is the i-th 
element of PV. Elements of SV are called standard variables and are denoted by 
letters y and z (possibly subscripted) . Terms are built using standard variables only. 

Standard Substitutions. Standard substitutions are substitutions in the usual 
sense which use standard variables only. The set of standard substitutions is de- 
noted by SS. Renamings are standard substitutions that define a permutation of 
standard variables. The domain and the codomain of a standard substitution a are 
denoted by dom(a) and codom(a), respectively. We denote by mgu(ti,t 2 ) the set 
of standard substitutions that are a most general unifier of terms t\ and t 2 - 

Program Substitutions. A program substitution is a set {x^/ti, . . . ,Xi n /t n }, 
where distinct program variables and t\, . . . , t n are terms. Pro- 

gram substitutions are not substitutions in the usual sense; they are best under- 
stood as a form of program store which expresses the state of the computation 
at a given program point. It is meaningless to compose them as usual substitu- 
tions or to use them to express most general unifiers. The domain of a program 
substitution 8 = {x^/ti, . . . , Xi n /t n }, denoted by dom(9), is the set of program 
variables {x^, . . . The codomain of 9, denoted by codom(9), is the set of 

standard variables occurring in ti, . . . , t n . Program and standard substitutions can- 
not be composed. Instead, standard substitutions are applied to program substi- 
tutions. The application of a standard substitution a to a program substitution 
9 = {x^/ti, . . . ,Xi n /t n } is the program substitution 9a — {x^/tia, . . . ,Xi n /t n a}. 
The set of program substitutions is denoted by PS. The application Xi6 of a pro- 
gram substitution 9 to a program variable Xi is defined only if Xi G dom(9); it 
denotes the term bound to Xi in 9. Let D be a finite subset of PV and He a 
program substitution such that D C dom(9). The restriction of 9 to D, denoted by 
9/d, is the program substitution such that dom{9/ D ) = D and Xi{9/ D ) = Xi9, for 
all Xi £ D. We denote by PSd the set of program substitutions with domain D. 

Canonical Program Substitutions. We say that two program substitutions 9 
and 9' are equivalent if and only if there exists a renaming p such that 9p = 9'. We 
assume that, for each program substitution 9, we are given a canonical represen- 
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tative, denoted by [0] , of the set of all program substitutions that are equivalent 
to 9. We denote by CPS the set of all canonical program substitutions [#]. For any 
finite set of program variables D, we denote by CPSd the set PSd H CPS. 

2.3 Program Substitution Sequences 

Program substitution sequences are intended to model the sequence of computed 
answer substitutions returned by a goal, a clause, or a procedure. 

Program Substitution Sequences. Let us denote by N* the set of positive natu- 
ral numbers. A program substitution sequence is cither a finite sequence of the form 

< 9\,...,6 n > (n > 0) or an incomplete sequence of the form < 9\, . . . , 6 n , _L > 
(n > 0) or an infinite sequence of the form < 9\, . . . , 9i, . . . > (i 6 N*), where 
the 9i are program substitutions with the same domain. We use the notation 

< 6\, . . . , 6i, _ > to represent a program substitution sequence when it is not known 
whether it is finite, incomplete or infinite. Let S be a program substitution se- 
quence. We denote by Subst(S) the set of program substitutions that arc elements 
of S. The domain of S is defined when S j^O and S ^< _L >. In this case, dom(S) 
is the domain of the program substitutions belonging to Subst(S). The set of all 
program substitution sequences is denoted by PSS. Let D be a finite set of pro- 
gram variables. We denote by PSSd the set of all program substitution sequences 
with domain D augmented with <> and < _L >. Let S E PSSd be a sequence 

< $i, . . . ,9i, _ > and D' C D. The restriction of S to D', denoted by S/ D i, is the 
program substitution sequence < 9i/ D i, . . . ,9ii D *,_ >. The number of elements of 
S 1 , including the special element _L, is denoted by Ne(S). The number of elements 
of S that are substitutions is denoted by Ns(S). Sequence concatenation is denoted 
by :: and it is used only when its first argument is a finite sequence. 

Canonical Substitution Sequences. The canonical mapping [•] is lifted to se- 
quences as follows. Let S be a program substitution sequence < 9\, . . . , 6i, _ >. Wc 
define [S 1 ] =< [^J, . . . , [#,],_ >. We denote by CPSS the set of all canonical sub- 
stitution sequences {Sj and by CPSSd the set PSSd H CPSS, for any finite subset 
D of PV. 

CPO's of Program Substitution Sequences. The sets PSS, PSS D , CPSS and 
CPSSd can be endowed with a structure of pointed cpo as described below. 

Definition 2.1 {Relation C on Program Substitution Sequences) 
Let Si , ^ G PSS. We define 

Si C S 2 iff cither Si = S 2 

or there exists S, S' € PSS such that S is finite, 
Si = S ::< _L > and S 2 - S :: S'. 

The relation C on program substitution sequences is an ordering and the pairs 
(PSS, C), (CPSS, C), (PSSd, E), and (CPSS D , Q arc all pointed cpo's. 
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We denote by (Si) ig N an increasing chain, So C Si C . . . C Sj Q . . . in PSS; 
whereas we denote by {Si} ig N a, non necessarily increasing, sequence of elements 
of PSS. 

Lazy Concatenation. Program substitution sequences are combined through the 
operation □ and its extensions and D^i defined below. 

Definition 2.2 [Operation □) 
Let Si, S 2 ePSS. 

S1OS2 = Si :: S2 if Si is finite 

= Si if Si is incomplete or infinite. 

Definition 2.3 (Operation ^=1) 

Let {S/c} fcc ]\j* be an infinite sequence of program substitution sequences (not neces- 
sarily a chain). For any n > 1, we define: 

D "=A = <> 

□£=A - (□^Z 1 1 5 fc )D5„. 



Definition 2.4 (Operation 

Let {S'fe} fe6 ]\j* be an infinite sequence of program substitution sequences. The infi- 
nite sequence {S-} ig ]\j where 5- = (□j. =1 S , fe)D < _L > (« G N) is a chain. So we are 
allowed to define: 

□ £° =1 S fe = U£oSJ = U» ((DU4)D<1>). 

The operation □ is associative; hence, it is meaningful to write SiD . . . nS„ in- 
stead of □^ =1 S'fe. Operations □, and □JJjLi are continuous with respect to the 
ordering C on program substitution sequences. 

Program Substitution Sequences with Cut Information. Program substitu- 
tion sequences with cut information are used to model the result of a clause together 
with information on cut executions. 

Let CF be the set of cut flags {cut,nocut}. A program substitution sequence 
with cut information is a pair (S, cf) where SgPSS and c/e CF. 

Definition 2.5 (Relation C on Substitution Sequences with Cut Information) 
Let (Si, c/i), (S 2 , cf 2 )ePSS x CF. We define 



(Si, cfi) C (S 2 , c/ 2 ) iff cither Si C S 2 and c/j = c/ 2 

or Si =< L > and cfi = nocut. 
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The relation C on program substitution sequences with cut information is an 
ordering. Moreover, the pairs (PSS x CF,Q, (PSS D x CF,Q, (CPSS x CF, Q 
and {CPSSd X CF, C) are all pointed cpo's. 

We extend the definition of the operation □ to program substitution sequences 
with cut information. The extension is continuous in both the arguments. 

Definition 2.6 [Operation □ with Cut Information) 
Let (Si, cf)ePSS x CF and S 2 ePSS. We define 

(Si, cf)OS 2 = S1OS2 if c/= nocut 
Si if c/= cut 



2.4 Concrete Behaviors 

The notion of concrete behavior provides a mathematical model for the input /output 
behavior of programs. To simplify the presentation, we do not parameterize the se- 
mantics with respect to programs. Instead, we assume a given fixed underlying 
program P. 

Definition 2.7 (Concrete Underlying Domain) 

The concrete underlying domain, denoted by CUD, is the set of all pairs (9,p) such 
that p is the name of a procedure pr of P and 9 £ CPSi Xl where x\, . . . ,x n 

are the variables occurring in the head of every clause of pr. 

Concrete behaviors are functions but we denote them by the relation symbol 1 — > 
in order to stress the similarities between the concrete semantics and a structural 



operational semantics for logic programs defined in (Lc Charlier and Van Henten 



ryck, 1995) 



Definition 2.8 (Concrete Behaviors) 

A concrete behavior is a total function 1 — k CUD — ► CPSS mapping every pair 
(9,p) £ CUD to a canonical program substitution sequence S such that, for every 
9' £ Subst(S), there exists a standard substitution a such that 9' = 9a. We denote 
by (9,p) ' — ► S the fact that 1 — > maps the pair (9,p) to S. The set of all concrete 
behaviors is denoted by CB. 

The ordering C on program substitution sequences is lifted to concrete behaviors 
in a standard way ( Schmidt, 198§| ). 



Definition 2.9 (Relation C on Concrete Behaviors) 
Let 1 — >i, 1 — > 2 e CB. We define 

1 — >iQ — > 2 iff ((9,p) 1 — n Si and (9,p) 1 — > 2 S 2 ) imply Si E S 2 , 
for all (9,p)eCUD. 

The following result is straightforward. 

Proposition 2.10 

(CB, C) is a pointed cpo, i.e., 

1. the relation C on CB is a partial order; 
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2. CB has a minimum clement, which is the concrete behavior i — >j_ such that 
for all (9,p) 6 CUD, (9,p) i — >j_< J_ >; 

3. every chain (i — >;)i 6 N ^ n ^® nas a l eas * upper bound, denoted by U°^ i — 
U°^ i — >i is the concrete behavior i — ► such that, for all (6,p) E CUD, 
(0,p) i > U^ 5i, where (0,p) ^ 5, (Vi€N). 

2.5 Concrete Operations 

We specify here the concrete operations which are used in the definition of the con- 
crete semantics. The choice of these particular operations is motivated by the fact 
that they have useful (i.e., practical) abstract counterparts (see Sections [| ||and||). 
The concrete operations are polymorphic since their exact signature depends on a 
clause c or a literal I or both. 

Let c be a clause, D — {xi, . . . , x n } be the set of all variables occurring in the 
head of c, and D' = {xi, . . . , x m } (n < m) be the set of all variables occurring in c. 

Extension at Clause Entry : EXTC(c, •) : CPS D -> {CPSS D > x CF) 

This operation extends a substitution 9 on the set of variables in D to the set of 

variables in D' . Let 9eCPS D . 

EXTC(c, 9) = (< 19'] >, nocut) 

where XiO' = XiO (Vi : 1 < i < n) and x n +iQ' , . . . , x m & are distinct standard 
variables not belonging to codom(8). 

Restriction at Clause Exit : RESTRC(c, •) : (CPSS D > x CF) -> (CPSS D x CF) 
This operation restricts a pair (S, cf), representing the result of the execution of c on 
the set of variables in D' , to the set of variables in D. Let {S, cf) e(CPSS' D x CF). 

RESTRC(c, (S, cf}) = {[S%cf) where 5" = S /D . 

Let / be a literal occurring in the body of c, D" — {x^, ...,X{ r } be the set of 
variables occurring in I, and D'" be equal to {x\, . . . , x r }. 

Restriction before a Call : RESTRG(Z,-) : CPS D » -> CPS D >» 

This operation expresses a substitution 9 on the parameters , . . . , X{ r of a call I 

in terms of the formal parameters x\, . . . ,x r oil. Let 0£ CPSd"- 

RESTRG(Z,(9) = [{xi/x H 9, . . . ,x r /x ir 9}}. 

Extension of the Result of a Call : EXTG(Z, •, •) : CPS D , x CPSS D ,„ CPSS D > 
This operation extends a substitution 9 with a substitution sequence S representing 
the result of executing a call I on 9. Hence, it is only used in contexts where 
the substitutions that are elements of S are (roughly speaking) instances of 9. 
Let 6 e CPS D >. Let S G CPSS D ,„ be of the form < 9' a x , . . . ,9' a,, . > where 
XjO 1 = Xy9 (1 < j < r) and the o~i are standard substitutions such that dom{a i ) C 
codom(9'). Let {z\,...,z s } = codom(9) \ codom{9'). Let • • • , be distinct 
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standard variables not belonging to codom(6) U codom(ai) (1 < i < Ns(S)). Let pi 
be a renaming of the form {zi/y^i, . . . , z s /y itS , y^i/zi, . . . ,y itS /z s }. 

EXTG(l,e,S) = [<0picri,...,epi(Ti,->]. 

It is easy to see that the value of EXTG(7, 9, S) does not depend on the choice of 
the yij. Moreover, it is not defined when S is not of the above mentioned form. 

Unification of Two Variables : UNIF-VAR: CPS {xi , X2} -> CPSS {xi , X2} 
Let 9 G CPS { Xl ,x 2 }- This operation unifies x\9 with x 2 9. 

UNIF-VAR(6>) = <> if xiO and x 2 6 are not unifiable, 

= [< 6a >] whore a £mgu(x\Q,X2d), otherwise. 

Unification of a Variable and a Functor : UNIF-FUNC(/, •) : CPS D -> CPSS D 
Given a functor / of arity n— 1 and a substitution 9 G CPS d where Z) = {zi, . . . , x n }, 
the UNIF-FUNC operation unifies x\9 with /(x2, . . . , x n )9. 

UNIF-FUNC(/, 6) = <> if XiO and f(x 2 , ... ,x„^) are not unifiable, 

= [< #cr >] where a €mgu(xi9, f(x2, ■■■ ,x n )8), otherwise. 

All operations above are monotonic and continuous. We assume that Sets of 
program substitutions are endowed with the ordering C such that 9^9' iff 9 = 9'. 



2.6 Concrete Semantic Rules 

The concrete semantics of the underlying program P is the least fixpoint of a conti- 
nuous transformation on CB (the set of concrete behaviors) . This transformation is 
defined in terms of a set of semantic rules that naturally extend a concrete behavior 
to a continuous function defining the input/output behavior of every prefix of the 
body of a clause, every clause, every suffix of a procedure and every procedure of 
P. This function is called extended concrete behavior and maps each element of the 
extended concrete underlying domain to a substitution sequence, possibly with cut 
information, as defined below. 

Definition 2.11 {Extended Concrete Underlying Domain) 

The extended concrete underlying domain, denoted by ECUD, consists of 

1. all triples (9,g, c), where c is a clause of P, g is a prefix of the body of c, and 
9 is a canonical program substitution over the variables in the head of c; 

2. all pairs (9, c), where c is a clause of P and 9 is a canonical program substi- 
tution over the variables in the head of c; 

3. all pairs (6, pr), where pr is a suffix of a procedure of P and 9 is a canonical 
program substitution over the variables in the head of the clauses of pr. 

Definition 2.12 [Extended Concrete Behaviors) 

An extended concrete behavior is a total function from ECUD to the set CPSS U 
(CPSS x CF) such that 
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1. every triple (6, g, c) from ECUD is mapped to a program substitution sequence 
with cut information (5*, cf) such that dom(S) is the set of all variables in c; 

2. every pair {9, c) from ECUD is mapped to a program substitution sequence 
with cut information (S, cf) such that dom(S) is the set of variables in the 
head of c; 

3. every pair {9,pr) from ECUD is mapped to a program substitution sequence 
S such that dom(S) is the set of variables in the head of the clauses of pr. 

The set of extended concrete behaviors is endowed with a structure of pointed 
cpo in the obvious way. It is denoted by ECB; its elements are denoted by i — >. 

Let i — ► be a concrete behavior. The concrete semantic rules depicted in Figure || 
define an extended concrete behavior derived from i — ►. This extended concrete 
behavior is denoted by the same symbol i — This does not lead to confusion since 
the inputs of the two functions belong to different sets. The definition proceeds by 
induction on the syntactic structure of P. 

The concrete semantic rules model Prolog operational semantics through the 
notion of program substitution sequence. Rule Rl defines the program substitution 
sequence with cut information at the entry point of a clause. Rules R2 and R3 
define the effect of the execution of a cut at the clause level. Rules R4, R5 and 
R6 deal with execution of literals; procedure calls are solved by using the concrete 
behavior i — > as an oracle. Rule R7 defines the result of a clause. Rules R8 and 
R9 define the result of a procedure by structural induction on its suffixes. Rule 
R8 deals with the suffix consisting of the last clause only: it simply forgets the 
cut information, which is not meaningful at the procedure level. Rule R9 combines 
the result of a clause with the (combined) result of the next clauses in the same 
procedure: it deals with the execution of a cut at the procedure level. The expression 
□ ^^Sfc used in Rules R4, R5 and R6 deserves an explanation: when the sequence 
S is incomplete, it is assumed that S Nc{S) —< _L >. This convention is necessary to 
propagate the non-termination of g' to g. 

The following results are instrumental for proving the well-dcfinedness of the 
concrete semantics. 

Proposition 2.13 {Properties of the Concrete Semantic Rules) 

1. Given a concrete behavior, the concrete semantic rules define a unique ex- 
tended concrete behavior, i.e., a unique mapping from CB to ECB. This 
mapping is continuous. 

2. Rules Rl to R6 have a conclusion of the form (9, g, c) i — ► (S, cf). In all cases, 
S is of the form < 6'o~i, . . . , 9'o~i, _ >, where the o~i are standard substitutions 
and (6', nocut) = EXTC(c, 9). 

Rules R7 to R9 have a conclusion of the form (9, ■) i — ► S. In all cases, S is 
of the form < 6o~\, . . . , 6o~i, _ >, where the o~i are standard substitutions. 

2.7 Concrete Semantics 

The concrete semantics of the underlying program P is defined as the least fixpoint 
of the following concrete transformation. 
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Definition 2.14 (Concrete Transformation) 

The transformation TCB : CB — » CB is denned as follows: for all i — >€ CB, 

pr is a procedure of P 
p is the name of pr 
(9,pr)^S 
Tl 



(9, P ) T°M s 

where stands for TCB(\ — >). Remember that (6, pr) i — > 5 is defined by means 
of the previous rules which use the concrete behavior i — ► as an oracle to solve the 
procedure calls. 

The transformation TCB is well-defined and continuous. 

Definition 2.15 (Concrete Semantics) 

The concrete semantics of the underlying program P is the least concrete behavior 
i — > such that 

TCB 



2. 8 Correctness of the Concrete Semantics 



Since OLD-resolution ( Lloyd, 1987 ; Tamaki, 1986 ) is the standard semantics of 



pure Prolog augmented with cut, our concrete semantics and OLD-resolution have 
to be proven equivalent. The proof is fairly complex because OLD-resolution is 
not compositional. Consequently, the two semantics do not naturally match. The 



equivalence proof is given in (Lc Charlicr et al, 1996). In this section, we only give 
the principle of the proof. 

1. We assume that OLD-resolution uses standard variables to rename clauses 
apart. The initial queries are also assumed to contain standard variables only. 

2. The notion of incomplete OLD-tree limited to depth k is defined (IOLDk-ticc, 
for short). Intuitively, an lOLDk-tiee is an OLD-tree modified according to 
the following rules: 

(a) procedure calls may be unfolded only down to depth k; 

(b) branches that end at a node whose leftmost literal may not be unfolded 
are called incomplete; 

(c) a depth-first left-to-right traversal of the tree is performed in order to 
determine the cuts that are reached by the standard execution and to 



prune the tree accordingly; see ( |Lloyd, 1987| ) 



(d) the traversal ends when the whole tree has been visited or when a node 
that may not be unfolded is reached; 

(e) the branches on the right of the left-most incomplete branch are pruned 
(if such a branch exists). 
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g ••• = <> 

Rl 

(0,g,c) >-^EXTC(c,0) 



R2 



9 ■•■= 9 , ■' 

s e{<± >,<>} 

{6,g,c)^ (S,cf) 



R3 



g ■■■■= g , ! 

(6,g',c)^(S,cf) 
S=< 6' >::S' 

{0,g,c) .— > (<<?' >,cut) 



g ■•■= g' , i 

(6,g',c)^ (S,cf) 
S —< 6i, . . . , 8i, _ > 
0' k — RESTRG(7, 9k) 
< S' k = UNIF-VAR(^) > 

S k = Erro(i,fl fc X) 

k (1 < fc < JVa(S)) ^ 



9 ■■■■= g' , i 

5* =< #i, . . . , Oi, _ > 
0fc = RESTRG(/,6» fc ) "I 

< Si = UNIF-FUWC(/X) I 
S k =ETYG(l,9 k ,S' k ) [ 
k (1 < k < Ns(S)) J 
R5 

(6,g,c)^(U N k <Ps k ,cf) 



R6 



s g' , i 

l ::= p(x ilr . . . , x in ) 
(6,g',c) .— > (S,c/> 
S=< 6»i, . . . ,6»i,_ > 
0J. = RESTRG(Z,0 fc ) 
(8' k ,p)^S ki 
S k = ETYG{l,6 k ,S k ) 
(1 < k < Ns(S)) 



(e,g,c)^(a^S k ,cf) 



R7 



c ::= /i :- g. 

(6,g,c) (S,c/> 

(6», c) i > RESTRC(c, (S, c/)) 



R8 



pr ::= c 
(0,c) .— > (S,c/> 



R9 



pr .■:= c pr 

(e,c) .— > (s, c /) 

{0,pr)r-^{S,cf)nS' 



Fig. 2. Concrete semantic rules 



3. Assuming a query of the form . . . , t n ) and denoting the concrete beha- 
vior TCB k (\ — by i — >k, it can be shown that the sequence of computed 
answer substitutions < oi, . . . , cjj, _ > for the WLDk-tree of . . . ,t n ) is 
such that (0,p ) i — >fc [< 0<Ti, . . . , 0<Tj, _ >] where 9 = {xi/ti, . . . , x n /t n }. 



Sequence-Based Abstract Interpretation of Prolog 



17 



4. The equivalence of our concrete semantics and OLD-resolution is a simple 
consequence of the previous result. 

For every query p(t\, . . . , t n ), < o~\, . . . , <7j, _ > is the sequence of computed 
answer substitutions of p{t\, . . . , t n ) according to OLD-resolution if and only 
if (6,p ) i — ► [< 8ai, . . . , 0o~i j _ >] where 6 = {xi/tx, . . . , x n /t n ] and i — > is 
the concrete behavior of the program according to our concrete semantics. 

In fact, the correctness of our concrete semantics should be close to obvious to 
anyone who knows about both Prolog and denotational semantics. So, the equiva- 
lence proof is a formal technical exercise, which adds little to our basic understan- 
ding of the concrete semantics. 



2.9 Related Works 



Denotational semantics for Prolog have been proposed before (De Bruin and De Vink 



1989; Debray and Mishra, 1988; Jones and Mycroft, 1984). Our concrete seman- 
tics is not intended to improve on these works from the language understanding 
standpoint. Instead, it is merely designed as a basis for an abstract interpretation 
framework; in particular, it uses concrete operations that are as close as possible to 



the operations used by the structural operational semantics presented in ( Le Char- 



lier and Van Hcntenryck, 1995) upon which our previous frameworks are based. 
This allows us to reuse much of the material from our existing abstract domains 



and generic algorithms; see, (Englebert et ah, 1993; Le Charlier et at, 1991; Le 



pharlier and Van Hcntenryck, 1994 ; |Lc Charlier and Van Hcntenryck, 1995 ). The 
idea of distinguishing between finite, incomplete, and infinite sequences is originally 



due to Baudinet (1992) 



3 Abstract semantics 

As we have already explained in the introduction, our abstract semantics is not 
defined as a least fixpoint of an abstract transformation but instead as a set of 
post-fixpoints that fulfill a safety requirement, namely pre-consistency. Moreover, 
the abstract domains are assumed to represent so-called chain-closed sets of concrete 
elements as specified below. 



3.1 Abstract Domains 

We state here the mathematical assumptions that are required to be satisfied by 
the abstract domains. Specific abstract domains will be described in Section |^. 

Abstract Substitutions. For every finite set D of program variables, we denote 
by CSd the set p{PS £>). A domain of abstract substitutions is a family of sets ASd 
indexed by the finite sets D of program variables. Elements of ASd are called ab- 
stract substitutions; they are denoted by p. Each set AS d is endowed with a partial 
order < and a monotonic concretization function Cc : ASd — * CSd associating to 
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each abstract substitution (3 the set Cc((3) of program substitutions it denotes. 

Abstract Sequences. For every finite set D of program variables, we denote by 
CSSd the set p(PSSd)- Abstract sequences denote chain-closed subsets of CSSd- 
A domain of abstract sequences is a family of sets ASSd indexed by the finite sets 
D of program variables. Elements of ASSd are called abstract sequences; they are 
denoted by B. Each set ASSd is endowed with a partial order < and a monotonic 
concretization function Cc : ASSd CSSd- Moreover, the following properties 
are required to be satisfied: (1) every ASS d contains an abstract sequence B± such 
that < _L >£ Cc(B±); (2) for every B 6 ASSd, Cc(B) is chain-closed, i.e., for 
every chain (Si)i e N of elements of Cc(B), the limit U°^ Si also belongs to Cc(B). 
The disjoint union of all the ASSd is denoted by ASS. 

Abstract Sequences with Cut Information. Let CSSC d denote p(PSSd x 
CF). A domain of abstract sequences with cut information is a family of sets 
ASSC d indexed by the finite sets D of program variables. Elements of ASSCd are 
called abstract sequences with cut information; they are denoted by C. Every set 
ASSC d is endowed with a partial order < and a monotonic concretization function 
Cc : ASSCd -» CSSC D . The disjoint union of all the ASSC D is denoted by ASSC. 

Abstract Behaviors. Abstract behaviors are the abstract counterpart of the con- 



crete behaviors introduced in Section 2.4. They are endowed with a weaker mathe- 



matical structure as described below. As in the case of concrete behaviors, a fixed 
underlying program P is assumed. 

Definition 3.1 {Abstract Underlying Domain) 

The abstract underlying domain, denoted by AUD, is the set of all pairs ((3,p) such 
that p is a procedure name in P of arity n and f3&ASr xlt „ tX \. 

Definition 3.2 [Abstract Behaviors) 

An abstract behavior is a total function sat : AUD — ► ASS mapping each pair 
(13, p) € AUD to an abstract sequence B with B <E ASSi xl Xn \, where n is the 
arity of p. The set of all abstract behaviors is denoted by AB. The set AB is endowed 
with the partial ordering < such that, for all sat\, sat2 S AB: 

sail < sat 2 iff sat^^p) < sat 2 ([3,p), M((3,p) £ AUD. 

It would be reasonable to assume that abstract behaviors are monotonic functions 
but this is not necessary for the safety results. The notation sat stands for "set of 
abstract tuples". It is used because the abstract interpretation algorithm, derived 
from the abstract semantics, actually computes a set of tuples of the form ((3,p, B), 
i.e., a part of the table of an abstract behavior. 



3.2 Abstract Operations 



In this section, we give the specification of the primitive abstract operations used 
by the abstract semantics. The specifications are safety assumptions which, roughly 
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speaking, state that the abstract operations safely simulate the corresponding con- 
crete ones. In particular, operations EXTC, RESTRG, RESTRC, UNIF-VAR, UNIF-FUNC 
are faithful abstract counterparts of the corresponding concrete operations. Hence, 
their specification simply states that, if some concrete input belongs to the con- 
cretization of their (abstract) input, then the corresponding concrete output be- 
longs to the concretization of their (abstract) output. Moreover, overloading the 
operation names is natural in these cases. Operation AI-CUT deals with the cut; its 
specification is also straightforward. Operations EXTGS and CONC are related to the 
concrete operations EXTG and □ in a more involved way. We will discuss them in 
more detail. Finally, operations SUBST and SEQ are simple conversion operations to 
convert an abstract domain into another. 



Let us specify the operations, using the notations of Section 2.5 



Extension at Clause Entry : EXTC(c, •) : AS D -> ASSC D > 

Let f3 G ASd and 9 G CPSd- The following property is required to hold. 

e e Cc{/3) extc(c,6») g Cc(extc(c,/?)). 

Restriction at Clause Exit : RESTRC(c, •) : ASSC D ' -> ASSC D 
Let C G ASSC D , and (S, cf) G (CPSS' D x CF). 

(S, cf) G Cc(C) => RESTRC(c, (S, cf}) G Cc (RESTRC (c, C)). 

Restriction before a Call : RESTRG(Z, •) : ASd" — » AS D '" 
Let (3 G AS D » and 9 G CPS D »- 

9 G Cc(f3) RESTRG(Z,6>) G Cc(RESTRG(Z, (3)). 

Unification of Two Variables : UNIF-VAR: AS[ xliX2 y -> ASS {xuX2} 
Let (3 G AS {xuX2} and 9 G CPS {xuX2} . 

9 G Ccifi) UNIF-VAR(e) G Cc (UNIF-VAR(/3)). 



Unification of a Variable and a Functor : UNIF-FUNC(/, •) : AS D -> ASS D 
Let f3 G ASd and 9 G CPSd- Let also / be a functor of arity n — 1. 
9 G Cc{(3) UNIF-FUNC (/, 9) G Cc(UNIF-FUNC(/, (3)). 



Abstract Interpretation of the Cut : AI-CUT: ASSC D > -> ASSC D ' 
Let C G ASSC D t, 9 e CPS D >, S G CPSS D >, cf G CF. 

(<>,cf)eCc{C) => (<>, cf) g Cc(ai-cut(c*)), 

(< 1 >, cf) G Cc(C) =^ (< J_ >, cf) G Cc(AI-CUT(C)), 
{< 6 >:: 5, c/) G Cc{C) ^ (< 6» >, cut) G Cc(AI-CUT(C)). 



Extension of the Result of a Call : EXTGS(Z, •, •) : ASSC D >xASS D »> -> ^55Cr,' 
The specification of this operation is more complex because it abstracts in a single 
operation the calculation of all sequences Sk = EXTG(Z, 9k, S' k ) and of their con- 
catenation d^^Sfe, performed by the rules R4, R5, R6 (see Figure ^). At the 
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abstract level, it may be too expensive or even impossible to simulate the execution 
of I for all elements of S, as denned in the rules. Therefore, we abstract S to its 
substitutions, losing the ordering. The abstract execution will be the following. As- 
suming that C abstracts the program substitution sequence with cut information 
(S 1 , cf) before I, we compute (3 = SUBST(C); then we compute (3 1 = RESTRG(7,/3) 
and, subsequently, we get the abstract sequence B resulting from the abstract exe- 
cution of I with input (3' . The set Cc(B) contains all sequences S' k of rules R4, R5, 
R.6. Then, an over approximation of the set of all possible values Sk is com- 

puted from the information provided by C and B. This is realized by the following 
operation EXTGS. Let C G ASSC D >, B G ASS D »,, (S, cf) e {CPSS D , x CF) and 
S[, ■ ■ ■ , S' Ns (s) e CPSSd'"- 



(S,cf) e Cc(C), 
S —< 0i, . . . , 6i, _ >, 
Vfc : 1 < k < Ns(S) : S' k £ Cc(B) 
and S k = EnG(l,e k ,S' k ) 



(O k l[ S) S k ,cf) G Cc(EXTGS(/,C,B)). 



Abstract Lazy Concatenation : CDNC : (AS D x ASSC D x ASS D ) — > ASS D 
This operation is the abstract counterpart of the concatenation operation □. It 
is however extended with an additional argument to increase the accuracy. Let 
B' = CDNC(/3, C, B) where [3 describes a set of input substitutions for a procedure; 
C describes the set of substitution sequences with cut information obtained by 
executing a clause of the procedure on (3; B describes the set of substitution se- 
quences obtained by executing the subsequent clauses of the procedure on [3. Then, 
B' describes the set of substitution sequences obtained by concatenating the results 
according to the concrete concatenation operation □. 

Let us discuss a simple example to understand the role of [3. Assume that 

Cc{C) = {(<>, nocut), (< {xi/a} >, nocut)} and Cc(B) = {<>,< {xi/b} >} 

If the input mode of x\ is unknown, it must be assumed that all combinations of 
elements in Cc(C) and Cc{B) are possible. Thus, 

Cc(B') = {<>, < {rri/a} >, < {xx/b} >, < {xi/a}, {xx/b} >}. 

On the contrary, if the input mode of x\ is known to be ground, the outputs 
(< {xi/a} >, nocut) and < {xi/b} > are incompatible since x\ cannot be bound 
to both a and b in the input substitution. In this case, we have 

Cc(B') = {<>, < { Xl /a} >, < { Xl /b} >}. 

The first argument [3 of the operation CONC provides information on the input 
values: it may be useful to improve the accuracy of the result. The above discussion 
motivates the following specification of operation CONC. Note that the statement 
(3(7 G SS : & — 9a) is abbreviated by 6' < 9 in the specification. Let (3 G ASd, 
C G ASSC D , B g ASS d , 9 G CPS D , {S u cf) e (CPSS D x CF) and S 2 G CPSS D . 
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W 



e e Cc(/3), 

(Si,cf) G Cc(C), 
S 2 G Cc(B), 
G Subst(Si) U Subst(S 2 ) 



{Si,cf)OS 2 G Cc(C0NC(/3,C*,B)). 



Operation SEQ : ASS Co -> ^SS^ 

This operation forgets the cut information contained in an abstract sequence with 
cut information C . It is applied to the result of the last clause of a procedure before 
combining this result with the results of the other clauses. 
Let C e ASSC D and {S, cf) G (CPSS D x CF). 

(S, cf) G Cc{C) => S G Cc(SEQ(C*)). 



Operation SUBST : ASSC D > -> ^^d' 

This operation forgets still more information. It extracts the "abstract substitution 
part" of C . It is applied before executing a literal in a clause. See operation EXTGS. 
Let C e ASSC D , and (S, cf) € [CPSS D , x CF). 

(S,cf)eCc(C) => Subst(S) C C7c(SUBST(C)). 



3.5 Abstract Semantics 

We are now in position to present the abstract semantics. Note that we are not 
concerned with algorithmic issues here: they are dealt with in Section ||. 



Extended Abstract Behaviors. Extended abstract behaviors are the abstract 



counterpart of the concrete extended behaviors defined in Section 2.6. 



Definition 3.3 {Extended Abstract Underlying Domain) 

The extended abstract underlying domain, denoted by EA UD, consists of 

1. all triples {(3,g,c), where c is a clause of P, g is a prefix of the body of c, 
(3 G AS d, and D is the set of variables in the head of c; 

2. all pairs (fJ, c), where c is a clause of P, (3 G AS d, and D is the set of variables 
in the head of c; 

3. all pairs {[3, pr), where pr is a procedure of P or a suffix of a procedure of P, 
(3 G AS d, and D is the set of variables in the head of the clauses of pr. 



Definition 3.4 (Extended Abstract Behaviors) 

An extended abstract behavior is a function from EAUD to ASS U ASSC such that 

1. every triple ((3,g,c) from EAUD is mapped to an abstract sequence with cut 
information C G ASSCd 1 , where D' is the set of all variables in c; 

2. every pair ((3, c) from EAUD is mapped to an abstract sequence with cut 
information C G ASSCd, where D is the set of variables in the head of c; 

3. every pair ((3 : pr) from EAUD is mapped to an abstract sequence B G ASSd, 
where D is the set of variables in the head of the clauses of pr. 
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TAB(sat){f3,p) = E(sat)(/3,pr) 

where pr is the procedure defining p, 



E(sat)(P,pr) =SEQ(C) 

where C = E(sat)((3,c) 



if pr ::— c 



E(sat){f3,pr) = C0HC(f3,C, B) 
where B — E(sat){f3, pr'} 
C = E(sat)(f3,c) 

E(sat){f3,c) = RESTRC(c, C) 
where C = E(sat){j3,g,c) 
g is the body of c 



if pr ::= c,pr' 



E(sat)(/3,<>,c) =EXTC(c, (3) 

E(sat)(l3, ( fl) !),c)=AI-CUT(C) 
where C = E{sat){j3,g,c) 

E(sat)((3, (g, l),c) = EXTGS(Z, C,B) 
where B = UNIF-VAR(/3') 

UNIF-FUNC(/, 13') 
sat{f3',p) 
f3' = RESTRG(/,/3") 
(3" = SUBST(C) 
C = E(sat){(3,g,c). 



if I 
if I 
if / 



Fig. 3. The abstract transformation 



The set of extended abstract behaviors is endowed with a structure of partial or- 
der in the obvious way. It is denoted by EAB and its elements are denoted by esat. 

Abstract Transformation. The abstract semantics is defined in terms of two se- 
mantic functions that are depicted in Figure |. The first function E : AB -> EAB 
maps abstract behaviors to extended abstract behaviors. It is the abstract counter- 
part of the concrete semantic rules of Figure ||. The second function TAB : AB — > 
AB transforms an abstract behavior into another abstract behavior. It is the ab- 
stract counterpart of Rule Tl in Definition [2.14 . 

Abstract Semantics. The abstract semantics is defined as the set of all abstract 
behaviors that are both post-fixpoints of the abstract transformation TAB and 
pre-consistent. The corresponding definitions are given first; then the rationale un- 
derlying the definitions is discussed. 

Definition 3.5 (Post-Fixpoints of TAB) 

An abstract behavior sat € AB is called a post-fixpoint of TAB if and only if 
TAB(sat) < sat, i.e., if and only if 

TAB (sat) {(3, p) < sat{/3,p), V((3 lP ) e AUD. 
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Definition 3.6 {Pre- Consistent Abstract Behaviors) 

Let i — > be the concrete semantics of the underlying program, according to Defi- 
nition 2.15. An abstract behavior sat G AB is said to be pre- consistent with respect 



to i — ► if and only if there exists a concrete behavior i — such that 

and such that, for all (f3,p) G AUD and (6,p) G CUD, 

(el^'S } ^ S € Cc(sat(f3,p)). 

In the next section, we show that any pre-consistent post-fixpoint sat of TAB is a 
safe approximation of the concrete semantics, i.e. , it is such that for all (f3, p) G AUD 
and (6,p) G CUD, 



9 G Cc(f3), 
(9,p)^S 



S G Cc(sat((3,p)). 



The abstract semantics is defined as the set of all pre-consistent post-fixpoints. 
Indeed, under the current hypotheses on the abstract domains, there is no straight- 
forward way to choose a "best" abstract behavior among all pre-consistent post- 
fixpoints. Thus, we consider the problem of computing a reasonably accurate post- 
fixpoint as a pragmatic issue to be solved at the algorithmic level. In fact, the 
abstract interpretation algorithm presented in Section ^ is an improvement of the 
following construction: define the abstract behavior sat± by 

sat ± (/3,p) = B x , V(l3,p)€AUD. 

Assume that the domain of abstract sequences is endowed with an upper-bound 
operation UB : ASSd x ASSd —> ASSd (not necessarily a least upper bound). For 
every sail, sat2 G AB, we define UB(saii, sa<2) by 

UB(sat 1 ,sat 2 ){f3,p) = m(sat 1 (/3,p},sat2(/3,p}), V(/3,p) G AUD. 

Let j be an arbitrarily chosen natural number. An infinite sequence of pre-consistent 
abstract behaviors sato, ■ ■ ■ , sati, ... is defined as follows: 

sato = satj_, 

sat l+1 = TABisat.,) (0 < i < j), 

sat i+ i = VB(sati, TAB (sati)) (j < *)• 

The abstract behaviors sati are all pre-consistent because sat± is pre-consistent by 
construction, every application of TAB maintains pre-consistency (as proven in the 
next section) , and each application of UB produces an abstract behavior whose con- 
cretization contains the concretizations of the arguments. Moreover, assuming that 
every partial order ASSd is finite or satisfies the finite ascending chain property, 
the sequence sato, . . . , sati, ■ ■ ■ has a least upper bound which is the desired pre- 
consistent post-fixpoint. In case the ASSd contains chains with infinitely many 



distinct elements, UB must be a widening operator (Cousot and Cousot, 1992c) 
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The sequence from sato to satj is not ascending in general. In fact, sat± is not 
the minimum of AB and TAB is not necessarily monotonic nor extensive (i.e., 
sat < TAB(sat) does not always hold). From step to j, the computation of 
the sati simulates as closely as possible the computation of the least fixpoint of 
the concrete transformation. From step j to convergence, all iterates are "lumped" 
together. All concrete behaviors i — >j, i — ... of the Kleene sequence of the con- 
crete semantics, are thus included in the concretization of the final post-fixpoint sat. 
So, sat describes properties that are true not only for the concrete i — ► semantics 
but also for its approximations i — >j, i — ■ ■ ■■ The choice of j is a compromise: 
a low value ensures a faster convergence while a high value provides a better accu- 
racy. The abstract interpretation algorithm presented in Section ^ does not iterate 
globally over TAB. It locally iterates over E for every needed input pattern (f3,p) 
and uses different values of j for different input patterns. Depending on the par- 
ticular abstract domain, the value can be guessed more or less cleverly. This is the 
role of the special widening operator of Definition 4.1. A sample widening operator 
is described in Section showing how the value of j can be guessed in the case 
of a practical abstract domain. 



3-4 Safety of the Abstract Semantics 



We prove here the safety of our abstract semantics. First, we formally define the 
notion of safe approximation. Then, we show that the abstract transformation is 
safe in the sense that, whenever sat safely approximates 1 — >, TAB(sat) safely ap- 



TCB 



proximates h^> (Theorem 3.5). From this basic result, we deduce that TAB trans- 
forms pre-consistent abstract behaviors into other pre-consistent abstract behaviors 



(Theorem 3.10), and that, when sat is a post-fixpoint of the abstract transformation 
which safely approximates a concrete behavior 1 — it also safely approximates the 



TCB 



concrete behavior (Theorem 3.11). Theorem [3.12 states that abstract beha- 



viors are, roughly speaking, chain-closed with respect to concrete behaviors. Finally, 



Theorem 3.13 states our main result, i.e., every pre-consistent post-fixpoint of the 



abstract transformation safely approximates the concrete semantics. 



Definition 3. 7 (Safe Approximation) 

Let 1 — >G CB and sat G AB. The abstract behavior sat safely approximates the 
concrete behavior 1 — ► if and only if, for all (6,p) G CUD and (f3,p) £ AUD, the 
following implication holds: 



G Cc(/3), 
(6,p)^S 



S G Cc(sat(P,p)). 



Similarly, let 1 — >G ECB and esat G EAB. The extended abstract behavior esat 
safely approximates 1 — ► if and only if, for all (0,pr), (6,c), (6,g,c) G ECUD and 
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=> S e Cc(esat{B,pr)), 
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9 e Cc(B), 
9,pr) i — ► S 



G Cc(/3), 
<0,c) > <S,c/} 

G Cc(/3), 
,.g,c) i — ► (5, c/} 



(5, c/) e Cc(esat(B,c)), 
(S, cf) G Cc(esat{8,g,c)). 



Theorem 3.8 (Safety of the Abstract Transformation) 
Let i — >G C5 and sa< G AB. If sai safely approximates 
approximates 



then TAB(sat) safely 



TCB 



We first establish the following result. Remember that if i 
in i?C5 is also denoted by i — ► (see Section |2.6| ) . 

Lemma 3.9 (Safety of E) 

Let i — >G CS and sat G ^45. If sa< safely approximates h 
approximates i — > (the extension of i — > in ECB). 



>G CB, its extension 



>, then E(sat) safely 



Proof of Lemma 3. £ 



We prove the lemma by structural induction on the syntax of the underlying pro- 
gram. It uses the concrete semantic rules of Figure |[ the definition of E in Figure [| 
and the specifications of the abstract operations given in Section 3.2. The proof is 



straightforward due to the close correspondence of the concrete and the abstract 
semantics. We only detail the reasoning for the base case and for the case of a goal 
(g, I) where I is an atom of the form p(xi lt . . . , Xi n ). The other cases are similar. 

Base case. Let {6,<>,c)eECUD and (B, <>, c) G EAUD. Assume that 9eCc(f3) 
and (6, <>, c) i — ► (S, cf). It must be proven that 

(S,cf) G Cc(E(sat)((3,<>,c)). 

This relation holds because of the three following facts: 



(S, cf) 
EXTC(c, 6) 

E{sat){B,<>,c) 



EXTC(c, 0) 
Cc(EXTC(c, B)) 
EXTC(c, B) 



(by R2), 

(by specification of EXTC), 
(by definition of E). 



Induction step. Let (0, (g,l),c) G ECUD and (8, (g,l),c) G EAUD, where I is an 
atom of the ioimp(xi 1 , . . . ,Xi n ). Assume that G Cc(B) and (0, (g, I), c) i — > (S, cf). 
It must be proven that 



(S, cf) G Cc(C), where C = E(sat)(B,(g,l),c). 
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By Rule R6, there exist program substitutions and program sequences such that 

(6,g,c)^(S',cf) (CI) 

S' =<6 x ,...A,-> (C2) 

6' k = RESTRG(Z, 6 k ) (1 < k < Ns(S)) (C3) 

(6> k , P )^S' k (l<k<Ns(S)) (C4) 

S k = EXTG(7, 9 k , S' k ) (1 < k < Ns(S)) (C5) 

S = n%°[ s) S k (C6) 

Moreover, by definition of E(sat), there exist abstract values such that 



c 


= EXTGS(Z, C, B) 


(Al) 


D 


= sat((3',p) 


(A2) 




= RESTRG(/,/3") 


(A3) 


&" 


= SUBST(C') 


(A4) 


C 


= E(sat)(p,g,c) 


(A5) 



The following assertions hold. By A5, CI, and the induction hypothesis, 

(S',cf) e Cc(C) (Bi). 

By AA, Bl, C2, and the specification of SUBST, 

6 k e Cc((3") (1 < k < Ns(S)) (B2). 

By A3, B2, C3, and the specification of RESTRG, 

6' k £ Cc(p') (l<k< Ns(S)) (B3). 

By A2, B3, C4, and the hypothesis that sat safely approximates i — >, 
S' k E Cc(B) (1 < k < Ns(S)) (BA). 

Finally, by Al, Bl, BA, C2, C5, C6, and the specification of EXTGS, 

(S,cf) £ Cc(C). □ 



Proof of Theorem 3^6 

The result follows from the definition of TAB in Figure |[ the definition of TCB in 
Section 2.14 , and Lemma |3.S| . □ 

The next theorem states that the transformation TAB maintains pre-consistency. 

Theorem 3.10 

Let sat £ AB. If sat is pre-consistent, then TAB(sat) is also pre-consistent. 
Proof 

Let i — > be the concrete semantics of the underlying program. Since sat is pre- 
consistent, there exists a concrete behavior i — such that 

1. i — >' C i — >, and 

2. sat safely approximates i — >' . 
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The first condition implies that 

TCB 

since TCB is monotonic and J— > = i — >. 

The second condition and Theorem 3.8 imply that 



TCB 

TAB(sat) safely approximates i — 



The result follows from the two implied statements and Definition 3.6. □ 



The next two theorems state closure properties of abstract behaviors, which are 
used to prove the safety of the abstract semantics. 

Theorem 3.11 

Let sat be a post-fixpoint of TAB. Let i — >G CB. If sat safely approximates i — ►, 
then sat also safely approximates i — ► . 

Proof 

Assume that sat safely approximate i — Let (6,p) G CUD and {(3,p) G AUD. It 
must be proven that 



9 G Cc{l3), 



S G Cc{sat{(3,p)). 



Assume that the left part of the implication holds. Theorem 3.8 implies that 

S G Cc(TAB(8at){/3,p)). 
Since sat is a post-fixpoint and Cc is monotonic, 

Cc(TAB{sat){/3,p}) C Cc(sat(f3,p}), 

and then 

S G Cc{sat{(3,p)). □ 

Theorem 3.12 

Let (i — > 0ieN ^ e a chain of concrete behaviors. Let sat G AB. If sai safely appro- 
ximates i — for all i G N, then sai safely approximates (U^ i — >, ). 

Let us abbreviate (U°^ i — >j) by i — >. It is sufficient to prove that, for any (/3,p) G 
AUD and any (0,p) G CUD, 



9 G Cc(/3), 
(6,p)^S 



S G Cc(sat(/3,p}). 



Fix {f3, p), {0,p), and 5 1 satisfying the left part of the implication. By Theorem 2.10 

S = U°g S z where {9,p) i — > 4 5 t Vt G N. 
Since saf safely approximates every i — >j, 
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Si e Cc(sat(f3,p)) for all % G N. 
Finally, since Cc(sat{f3,p)) is chained-closed, 

S G Cc(sat((3,p)). □ 
The last theorem states our main result. 
Theorem 3.13 {Safety of the Abstract Semantics) 

Let sat be a pre-consistent post-fixpoint of TAB. Then sat safely approximates i — ► 
where i — > is the concrete semantics of the underlying program. 

We first establish the following statement. 

Lemma 3.14 

Let sat be a pre-consistent post-fixpoint of TAB. There exists a chain of concrete 
behaviors (i — > i)i^jsi sucri that sat safely approximates i — >j, for all i G N and 
(U^q i — >,;) = i — > where i — > is the concrete semantics of the underlying program. 



Proof of Lemma 3. 14 



The proof is in three steps. First we construct a sequence {i — ^I^n of lower- 
approximations of i — > which is not necessarily a chain; then we modify it to get 
a chain (i — y i) ie N', finally, we show that (U°^ i — >i) = i — >. The proof uses the 
following property of program substitution sequences, whose proof is left to the 
reader. If Si, S2 and S are program substitution sequences such that Si E S and 
S2 E S, then Si and S2 have a least upper-bound, which is cither Si or S2. The 
least upper-bound is denoted by Si U S2 in the proof. 

1. Since sat is pre-consistent, there exists a concrete behavior 1 — >' such that sat 
safely approximate 1 — and 1 — E 1 — ►. The sequence {1 — >'J ieN is defined 
by 

TCB 

— »' =^-' and { +1 = _J (i g N). 

Since 1 — >' C 1 — >, TCB is monotonic and 1 — ► is a fixpoint of TCB, it follows 
that 

i— ►{ E i— > (V* G N). 



Moreover, by Theorem 3.11, sat safely approximates every 1 — ►J. 
(1 — >i)i£N is now constructed by induction over i. The correctness of the 
construction process requires to prove that, after each induction step, the 
relation 1 — ^ C 1 — ► holds. We first define 

, j 

1 — >a - 1 — > • 

Let i G N. Assume, by induction, that 1 — >o E • • ■ E 1 — ►» E 1 — >• For every 
(9,p) G Cf/D, we define 



(0,p)> — > 4+ i(SiUS 2 ) where 



(6,p) 1 — >i Si, 
<e,p) S 2 . 



Since 1 — >' i+1 E 1 — > and 1 — E 1 — we have that 1 — >i+i is well-defined and 
1 — >j_|_i E 1 — Moreover, since sat safely approximates 1 — >i (by induction) 
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and i — and S1US2 is equal either to Si or S2, in the definition of 1 — >. i+1 , 
we have that sat safely approximates every 1 — ►i+i. 
3. The Kleene sequence of the concrete semantics is a chain (1 — > ")ieN defined 



as follows: 



TCB 

.11 



— »ff = — M. and J' +1 = — ,« (ieN). 

Since 1 — >j_ C 1 — and TCB is monotonic, it follows, by induction, that 

— E — >i E — u E (VieN). 

Therefore, by definition of the least upper bound and since the least fixpoint 
is the limit of the Kleene sequence, 



Thus, 



(U£o— ►?) E(U^— »i) E 



Proof of Theorem 3. IS 



The result is an immediate consequence of Theorem 3.12 and Lemma 3.14 □ 



3.5 Related Works 

In this section we first discuss the mathematical approach underlying our abstract 
semantics and relate it with the higher-order abstract interpretation frameworks 



advocated by Cousot and Cousot ( |1994| ). Then, we compare our approach with the 
abstract semantics for Prolog with control proposed by Barbuti et al. ( 1993 ), by 
File and Rossi (|l993| ), and by Spoto fl2000|) . 



Cousot and Cousot's Higher-order Abstract Interpretation Frameworks. 

As mentioned in the introduction, the traditional approach to abstract interpreta- 
tion can not be applied to approximate the concrete semantics of Section ^. Indeed, 
we can define a set-based collecting transformation by lifting the concrete seman- 
tics to sets of program substitution sequences. However, the least fixpoint of the 
collecting transformation does not safely approximate the concrete semantics. The 
problem can be solved by restricting to sets of p(CPSo) and p(CPSSd) that enjoy 
some closure properties ensuring safeness of the least fixpoint. This solution is simi- 



lar to the choice of a power-domain structure in denotational semantics (Schmidt 



1988; Stoy, 1977): the needed constructions can in fact be viewed as power-domains. 
However there is no best way to choose the closure properties. Different closure 
properties are adequate for different sorts of information. It is therefore advocated 



by Cousot and Cousot in (1994) that, for higher-order languages, different collec- 
ting semantics should be defined for the same language depending on the kind of 
properties to be inferred. In our case, at least two dual collecting semantics could 
be defined. Both of them use sets of program substitution sequences that are chain- 
closed. 
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The first semantics considers downwards- closed sets of program substitution 
sequences, i.e., such that for any S, S' G CPSSd, 



S'cS 



S' e E. 



This domain is ordered by inclusion and its minimum is {< _L >}. It is ade- 
quate to infer non-termination and upper bounds to the length of sequences. 
In particular, it is adequate for determinacy analysis. However, it is unable 
to infer termination since < _L > belongs to any set of sequences. 
The second semantics considers upwards- closed sets of program substitution 
sequences, i.e., such that for any S, S' G CPSSd, 



S c s' 



S' e E. 



This domain is ordered by E < E' E' C E and its minimum is CPSSd- 
It is able to infer termination and lower bounds to the length of sequences. 
It is less adequate than the previous one to infer precise information about 
the substitutions in the sequences because its least fixpoint corresponds to a 
greatest fixpoint in a traditional framework ignoring the sequence structure. 

In both cases, the least fixpoint is well-defined because the collecting versions of 
the operations are monotonic, since they have to ensure the closure properties. 
Moreover, the least fixpoint of the collecting semantics safely approximates the 
concrete semantics because all iterates are pre-consistent and the sets are chain- 
closed. Nevertheless, our formalization has some advantages. 

1. It can be more efficient: a single analysis is able to infer all the information 
that can be inferred by the two collecting semantics. 

2. It can be more accurate: there are pre-consistent post-fixpoints that are more 
precise than the intersection of the two collecting semantics. 



Barbuti et al.'s Abstract Semantics. The abstract semantics proposed by Bar- 
buti et al. ( 1993f) aims at modeling control aspects of logic programs such as search 
strategy and selection rule. Their semantics is parametric with respect to a "termi- 
nation theory" . The meaning of a program is obtained by composing the meaning 
of its "logic component" together with a corresponding "termination theory" (the 
"control component"). The latter can be provided either by applying techniques 
of abstract interpretation or by applying proof procedures. In all cases, control 
information is deduced from outside in the form of a separated termination analy- 
sis. This is the main difference with our framework, where control information, i.e. 
information relative to termination or non-termination, is modeled within the se- 
mantic domains through the notion of substitution sequence. 



File and Rossi's Abstract Interpretation Framework. The framework pro- 
posed by File and Rossi (1993) consists of a tabled interpreter which explores OLDT 
abstract trees decorated with control information about sure success or failure of 
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the goals. Such information is used by the cut operation to prune the OLDT-tree 
whenever a cut is reached. Sure success is modeled in our framework by abstract 
sequences representing only non-empty sequences. The abstract semantics defined 
by File and Rossi is operational and non-compositional while ours is compositional 
and based on the fixpoint approach. Moreover, the abstract execution of a goal 
(g, !) is different. Whenever is known that g surely succeeds, their framework stops 
after generating the first "sure" solution, while ours computes the entire abstract 
sequence for g and then cuts it to maintain at most one solution. Our approach 
may thus imply some redundant work. However, if g is used in several contexts, 
their framework should recognize this situation and expand the OLDT-tree further. 



Spoto's Denotational Abstract Semantics. The related work closest to ours is 
the denotational abstract semantics proposed by Spoto ( 2000| ). He defines a goal- 
independent and compositional abstract semantics of Prolog modeling the depth- 
first search rule and the cut. His semantics associates to any Prolog program a 
sequence of pairs consisting of a "kernel" constraint and its "observability" part. 
Intuitively, kernel constraints denote computed answers, while observability con- 
straints give information about divergent computations and cut executions. The 
main difference with our approach is that his semantics is goal-independent while 
ours is not. This is due to the fact that our abstract semantics is functional, i.e., it 
associates to each program P a function (an abstract behavior) mapping every pair 
(/3,p) to an abstract sequence B. However, this choice is unrelated to our concrete 
semantics: we could as well abstract the concrete semantics by a relational abstract 
semantics (Cousot and Cousot, 1992b), making it possible to express dependencies 
between input substitutions and the corresponding output substitution sequences. 
This is the approach of ( Le Charlier et at, 199E ) where we express dependencies 
between the size of input terms and the number of corresponding output substitu- 
tions. We will go back to this issue at the end of Section 6.2. 



4 Generic abstract interpretation algorithm 



A generic abstract interpretation algorithm is an algorithm that is parametric with 
respect to the abstract domains. It can be instantiated by various domains to obtain 
different data-flow analyses. Several such algorithms have been proposed for Prolog 



(Bruynooghc, 1991; Englebert et a/., 199S; Le Charlier et a/., 1991; Le Charlier et 



ai, 1993); Le Charlier and Van Hentenryck, 1994; Le Charlier and Van Hentenryck 



1995; Mellish, 1987; Muthukumar and Hcrmencgildo, 1992), but they do not handle 
the control features of the language such that Prolog search rule and cut. 

The algorithm presented here is essentially an instantiation of the universal fix- 



point algorithm described in (Le Charlier and Van Hentenryck, 1993) to the abstract 
semantics of Section a. In particular, it is quite similar to the algorithm presented 



in (Le Charlier et al, 1991; Le Charlier and Van Hentenryck, 1994): in fact, the 
abstract semantics of Section || can be viewed as a proper generalization of the 
abstract semantics described in those papers, where the sequences of computed an- 
swer substitutions are no longer abstracted to sets of substitutions. 
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The universal algorithm in ( |Lc Charlier and Van Hentenryck, 1993 ) is top-down, 
i.e., it computes a subset of the fixpoint (in the form of a set of tuples) contai- 
ning the output value corresponding to a distinguished input together with all the 
tuples needed to compute it. Top-down algorithms are naturally used to perform 
data-flow analyses, where one is interested in collecting the abstract information 
corresponding to a class of initial queries described by the distinguished input. It is 
more efficient in general to compute a part of the fixpoint only and this allows one 



to use infinite abstract domains, which are more expressive ( Cousot and Cousot 
1992c ). Although the instantiation of ( Lc Charlier and Van Hentenryck, 1993| ) to 



our abstract semantics is as mechanical as in our previous works (a slightly more 
general widening operator is needed however), the correctness of the algorithm in- 
volves some new theoretical issues: the pre-consistency of the post-fixpoint has now 
to be proven. Nevertheless, since the novel algorithm is in practice very similar to 
the algorithm presented in ( Le Charlier and Van Hentenryck, 1994 ), we only discuss 
here the extended widening operator which ensures a good compromise between ef- 
ficiency and accuracy. A detailed description of the algorithm and its correctness 
proof can be found in (Le Charlier et ai, 1997). 



4-1 Extended Widening 

The extended widening operation used by the novel algorithm is defined as follows. 
Definition 4- 1 (Extended Widening) 

An extended widening on abstract sequences is a (polymorphic^]) operation V : 
ASS £) x ASSd —> ASSd that enjoys the following properties. Let {-Bi}i<=N be a 
sequence of elements of ASSd- Consider the sequence {S,-}i e N defined by 

B' = B a , 

B'i+i = B i+1 VB'i (i€N). 
The following conditions hold: 

1. B[ > Bi (i G N); 

2. the sequence {£?|}, 6 n is stationary, i.e., there exists j > such that B'i = B'j 
for all i such that j <i. 



An extended widening is slightly more general than a widening (Cousot and 



jCousot, 1992c ) because the sequence {-B z '}i G N is not required to be a chain. 



Let us now explain how the extended widening is used by the algorithm. Given 
an input pair (@,p), the algorithm iterates on the computation of TAB (sat) ((3, p) 
until convergence, and concurrently updates sat, as follows (recursive calls - which 
also modify sat - are ignored in the discussion) : 

1. B' Q = B± is stored in the initial sat as the output for (/3,p); 

2. Bi results from the i-th execution of TAB (sat) {(3, p); 



It is parametrized over D. 
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3. B'i = BiVB'i-i is stored in the current sat after the i-th execution of 
TAB(sat)(f3,p); 

4. the loop is exited when Bi + i < B[. 

The loop terminates because there must be some i such that B' i+1 = B[ (other- 



wise Condition 2 of Definition 4T would be violated), and, hence, < B[ since 
B'i+i > Bi + i by Condition 1. The loop can be resumed later on because some values 
in sat have been updated (Step 1 is omitted in these subsequent executions); all 
re-executions of the loop terminate for the same reasons as the first one; moreover, 
the loop can only be resumed finitely many times because no element in sat can be 
improved infinitely many often, since there is a j such that B[ = B'j for all i greater 
or equal to j. Note that a local post-fixpoint is attained each time the loop is exited. 
Thus a global post-fixpoint is obtained when all loops are terminated for all values 



in sat. The formal characterization of Definition 4.1 elegantly captures the idea that 
the algorithm sticks as closely as possible to the abstract semantics during the first 
iterations, and starts lumping the results together only when enough accuracy is 
obtained, in order to ensure convergence. The advantage of this characterization 
is that no particular value of j is fixed. So we can think of "intelligent" extended 
widenings that observe how the successive iterates behave and that enforce conver- 
gence exactly at the right time. The extended widening used in our experimental 



evaluation is based on this intuitive idea (see Section 5.2) 



5 Cardinality analysis 



The abstract interpretation framework for Prolog presented in previous sections has 
been instantiated by a domain of abstract sequences to perform so-called cardinality 



analysis; see (Bracm et ai, 1994). Cardinality analysis approximates the number of 
solutions to a goal and is useful for many purposes such as indexing, cut insertion 
and elimination ( Debray, 1989] ; Sahlin, 1991), dead code elimination, and memory 



management and scheduling in parallel systems (Bucno and Hermcnegildo, 1991 



Hermenegildo, 1986). The analysis subsumes traditional determinacy analysis such 



as those of ( Dawson et ai, 1993 ; Debray, 1989 ; Giacobazzi and Ricci, 1992 ; Bahlin 



1991[ ) . 

This section is organized as follows. First we describe how a generic abstract 
domain for cardinality analysis, which is parametric with respect to any domain of 
abstract substitutions, can be built. Then, we instantiate this generic domain to the 



domain of abstract substitutions Pattern ( Lc Charlicr and Van Hcntcnryck, 1994 ). 
Finally, we discuss experimental evaluations of the analysis from both accuracy and 
efficiency standpoints. 



5.1 Generic Abstract Domains for Cardinality Analysis 

In this section, generic domains of abstract sequences and abstract sequences with 
cut information are built. The domains are generic with respect to the information 
on the substitutions in the sequences, but they provide specific information about 
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the sequence structure. The latter consists of lower and upper bounds to the num- 
ber of substitutions in the sequences and information about the nature (i.e., finite, 
incomplete or infinite) of the sequences. This information allows us to perform non- 
termination analysis and a limited form of termination analysis. Predicate level 
analyses, like determinacy and functionality ( Debray and Warren, 1989| ), which 



were previously considered falling outside the scope of abstract interpretation, can 
be performed. 

Abstract Substitutions. The substitution part of our generic domain of abstract 
sequences is assumed to be an element of an arbitrary domain of abstract substitu- 
tions ASd- The only requirement on AS d is that it contains a minimum clement 
/?0 such that Cc([3<d) — 0. An abstract domain can always be enhanced with such 
an element. 



Abstract Sequences. The generic domain of abstract sequences manipulates ter- 
mination information whose domain is defined below. 

Definition 5.1 (Termination Information) 

A termination information t is an element of the set TI = {st, snt, pt} endowed 
with the ordering < defined by 

h < h <^ either ti = t 2 or t 2 = pt Vti, t 2 G TI. 

The symbol st stands for "sure termination" and it characterizes finite sequences; 
snt stands for "sure non termination" and characterizes incomplete and infinite 
sequences; pt stands for "possible termination" and corresponds to absence of in- 
formation. 

The domain of abstract substitution sequences is defined as follows. 
Definition 5.2 (Abstract Sequences) 

Let D be a finite set of program variables. We denote by ASSd the set of all 4-tuples 
(J3, to, M, t) such that G AS D , m G N, M G N U {oo}, and t G TI. 

Informally, describes all substitutions in the sequences, to and M are lower 
and upper bounds on the number of substitutions in the sequences, and t is an 
information on termination. 

The ordering on abstract sequences is defined as follows. 

Definition 5.3 (Ordering on Abstract Sequences) 
Let B X ,B 2 e ASS D . 

B\ < B% iff (3i < 02 and mi > m 2 and M\ < M 2 and t\ <t 2 . 

The set of program substitution sequences described by an abstract sequence B 
is formally defined as follows. 

Definition 5.4 (Concretization for Abstract Sequences) 
Let B=(J3, m, M, t) G ASS D . We define 
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Cc(B) = Sseq^/3) n Sseq 2 (m, M) n Sseq 3 (t) 
where 

Sseq^fS) = {S:SE PSS d and Subst(S) C Cc(/3)}, 
Sseg 2 (m, M) = {S:Se PSS and m < 7Vs(S) < M}, 
Sseq 3 (snt) — {S : S € PSS and S is incomplete or infinite}, 
Sseq 3 (st) = {S : S € PSS and 5 is finite}, 
Sseq 3 (pt) = PSS. 

Monotonicity of the concretization function is a simple consequence of the defi- 
nition. 

We denote by B± the special abstract sequence (J3$, 0, 0, snt) which is such that 



Cc (.BjJ = {< _L >} as required in Section 3.1. It is easy to prove that for all 



abstract sequences B £ ASSd, the set Cc(B) is chain-closed; see (Le Charlier ei 



al, 1997) 



Abstract Sequences with Cut Information. Abstract sequences with cut in- 
formation are obtained by enhancing abstract sequences with information about 
execution of cuts. 

Let us first define the abstract domain for cut information. 

Definition 5.5 (Abstract Cut Information) 

An abstract cut information acfis an element of the set ACF — { cut, nocut, weakcut}. 
Definition 5.6 (Abstract Sequences with Cut Information) 

Let I? be a finite set of program variables. We denote by ASSC d the set of pairs 
(B, acf) where B £ ASS D and acf £ ACF. 

Informally, cut indicates that a cut has been executed in all sequences, nocut that 
no cut has been executed in any sequence, and weakcut that a cut has been executed 
for all sequences producing at least one solution. More formally, the concretization 
of an abstract sequence with cut information is defined as follows. 

Definition 5.7 (Concretization for Abstract Sequences with Cut Information) 
Let B £ ASS D . We define 

Cc({B, cut)) = {(S, cut) : S £ Cc(B)}, 

Cc({B, nocut)) = {(S, nocut) : S £ Cc(B)}, 
Cc({B, weakcut)) = {{S, cut) : S £ Cc(B)}U 

{{S, nocut) : S £ Cc(B) and S £ {<>, < _L >}}. 

5.2 Abstract Operations 

Our next task is to provide definitions of all abstract operations specified in Sec- 
tion 3.2 . For space reasons, we describe here a subset of the operations, i.e., extended 



widening, unification, operation treating cut, and concatenation. The other opera- 



tions are described in the appendix. The reader is referred to (Le Charlier et al 



1997) for the correctness proofs. 
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The operations on abstract substitutions which are used in the definition of the 
operations on abstract sequences will be recalled when needed. 

Extended Widening: V : ASS D x ASS D -» ASS D 

We require that the abstract domain AS d is equipped with a widening operation 
V' : ASd x AS d — * ASd- It can be an extended widening, a normal widening, 
or, if AS d is finite or enjoys the finite ascending chain property, any upper bound 
operation. The widening on sequences is obtained by taking the least upper bound 
of the termination components, the minimum of the lower bounds and setting the 
upper bound to infinity. 

Assume that B oid = ((3 id,m i d , M i d ,t i d ) and B new = ((3 

new i Wlnew ; M new , t new ) . 

The operation V : ASSd x ASSd — > ASSd is defined as follows. 



B ■new ) if Pnew % Paid 

= (f3oid,m new , M nem , pt) if flnew < floid and t new told 

= (f3oid,mm(m new ,m o id),oo y t i d ) if [3 new < (3 oU and t nem < t old and 

(nlnem <m a id or M new > Mold) 

= Bold if Bnew < Bold- 



The first case makes sure that the algorithm iterates until the abstract substitu- 
tion part stabilizes. When it is stable, the widening is applied on sequences. 

Example. Consider the following program: 
repeat. 

repeat :- repeat. 

The concrete semantics of this program maps the input (e, repeat), where e is 
the empty substitution, to the infinite sequence < e, . . . , e, . . . >. 

On this example, because the program has no variables, our domain of abstract 
substitutions only contains two values, say /3@ and such that 

Cc(M = 
Cc(f3 T ) = {e}. 

Let B±_ = (/?0,O,O, snt). Starting from £?x, the algorithm computes the abstract 
sequences 

Bo = B± B' = B± 

B x = (fir, 1, 1, snt) B[ = Brf B' = (p T , 1,1, snt) 

B 2 = (i3 T ,2,2,snt) B' 2 = B 2 VB[ = (fir, 1, oo, snt) 

B 3 = (fir, 2, oo, snt) 

Notice that the widening on sequences is applied when the abstract substitution 
part stabilizes, i.e., after the computation of the abstract sequence B 2 . The next 
iterate B 3 satisfies the property that B 3 < B' 2 . Hence, according to the discussion 



in Section 4.1, the execution terminates returning the final value 



B' 2 = (Pt, 1> °°> snt). 
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Observe that B' 2 safely approximates the concrete infinite sequence < e, . . . , e, . . . >. 
Moreover, it expresses the fact that the execution of repeat surely succeeds at least 
once and surely does not terminate^]. 

Unification of Two Variables: UNIF-VAR: AS{ XltX2 y -> ASS{ Xl>X2 y 
Given an abstract substitution [3 with domain {x\,X2}, this operation returns an 
abstract sequence which represents a set of substitution sequences of length or 1 
(depending upon the success or failure of the unification). The terms bound to 
x\ and X2 are unified in all these sequences. The operation UNIF-VAR on abstract 
sequences uses an upgraded version of the operation UNIF-VAR on abstract substi- 



tutions defined in (Lc Charlier et al, 1991; Le Charlicr and Van Hcntcnryck, 1994) 



The latter, in addition to the resulting abstract substitution, produces now two 
flags indicating whether the unification always succeeds, always fails, or can both 
succeed and fail. The additional information is expressed by the boolean values ss 
and sf as specified below. 

Operation UNIF-VAR: AS{ XliX2 y — > (AS{ XliX2 y x Bool x Bool) 

Let [3 £ AS{ Xl<X2 y and {(3', ss, sf) = UNIF-VAR(/3). The following conditions hold: 

1. V9 e Cc(f3) :Va e SS : {a <E mgu(x 1 9,x 2 9) => {9a\ G Cc(f3')); 

2. ss — true => (V6* G Cc((3) : x\9 and x 2 9 are unifiable); 

3. sf = true => (V0 G Cc([3) : x\9 and x 2 9 are not unifiable). 



Based on the upgraded operation UNIF-VAR for abstract substitutions, we pro- 
vide an implementation of the operation UNIF-VAR for abstract sequences, which is 



correct with respect to the corresponding specification given in Section ^2 . 

The operation UNIF-VAR: AS{ Xl X2 y — ► ASS{ XltX2 \ on abstract sequences is de- 
fined as follows. Let (3 G AS{ Xl>X2 \ and (f3",ss,sf) = UNIF-VAR(/3). We have that 
UNIF-VAR(/3) = B' where B' is the abstract sequence {j3', m', M' , t') such that 

P = (3" 

m! = if ss then 1 else 
M' = if sf then else 1 
tf = st. 

Abstract Interpretation of the Cut: AI-CUT: ASSC D , -> ASSC D > 
Let C = ((/3,TO,M,t), acf). AI-CUT(C) = ((/?', m' M', t'), acf) where 



2 This example alsfl shows that, our fra r rinwnrk can cypress n on-failure properties such as the ones 
described in (Bossi and Cocco, 1999; Debray et al., 1997). 
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P 
m' 
M' 
t! 



acf 



(3 

min(l, m) 

min(l,M) 

st 

snt 

pt 

cut 

nocut 

weakcut 



if m > 1 or t — st 
if M = and t = snt 
otherwise 

if m > 1 or acf — cut 

if M — and acf — nocut 

otherwise. 



Example. Consider the program 



P(X) 
q(X) 
q(X) 



q(X), !. 
X = a. 
X = b. 



For the sake of simplicity we use a simple domain of abstract substitutions which 



can be seen as the mode component of the Pattern domain (Lc Charlier et at, 1999 



Le Charlier and Van Hentenryck, 1994). The example is intended to illustrate the 
abstract execution of the operation AI-CUT. Hence, we do not enter here into the 
details of the other operations, but the reader is referred to the appendix for their 
definition. 

The abstract execution of the procedure p called with its argument being a vari- 
able is as follows. Let 

[3 == X i ► var 

be the initial abstract substitution. Let c be the clause of the program defining p. 
First, the abstract sequence with cut information C is computed by 

C = EXTC(c, P) = ((X i ► var, 1, 1, st), nocut). 

Then, the procedure q that occurs in the body of c is executed with [3 = SUBST(C) 
returning the abstract sequence 

B=(Xh ground, 2, 2, st). 

Hence, the abstract sequence with cut information C is computed as follows 

C = EXTGS(q(X), C, B) = ((X ^ ground, 2, 2, st), nocut). 

Now, the operation AI-CUT(C') is applied. Following the definition above, one ob- 
tains 

AI-CUT(C') = ((X h-> ground, 1, 1, st), cut) 
expressing the fact that a cut in the body of c is surely executed. The final result is 

B' = SEQ(C') = (Xh ground, 1,1, st) 

stating that the execution of p called with its argument being a variable surely 
terminates and succeeds exactly once. 

Consider now the abstract execution of the procedure p called with a ground 
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argument. Let 

f3 = X i— > ground 

be the initial abstract substitution. In this case, the abstract sequence with cut 
information C is first computed by 

C = EXTC(c, 0) = ((X h-> ground, 1,1, st), nocut). 

Then, the procedure q is executed with (3 = SUBST(C) returning 

B=(Xh ground, 0, 1, st). 

The abstract sequence with cut information C is computed as follows 

C' = EXTGS(q(X), C, B) = ((X ^ ground, 0, 1, st) : nocut). 

The operation AI-CUT(C) returns 

AI-CUT(C') = ((X i ► ground, 0, 1, st), weakcut) 

expressing the fact that, in this case, the computation either fails without executing 
the cut or succeeds once after executing the cut. The final result is 

B' = SEQ(C') = (Xh ground, 0, 1, st) 

stating that the execution of p called with a ground argument succeeds at most 
once and surely terminates. 

The Pattern domain used in our experiments is more elaborated than the simple 
domain of abstract substitutions used in this example. However, it does not provide 
more precision in these cases. A more sophisticated domain where an abstract 
sequence is represented as (< /3±, . . . , (3 n >,m, M, t) with < /3i, . . . , f3 n > being 
an explicit sequence of abstract substitutions could return in the first case a more 
precise result. Indeed, one could obtain B = ({X i— > a}, {X i— > b}, 2, 2, st) and then 
B' = ({X i — > a}, 1, 1, si). However, such a domain could not improve the result in 
the second case since the fact that the output substitution can be either X i— > a or 
X^b would be represented by X i— > ground as we have done above. 

Abstract Lazy Concatenation. The implementation of the operation CONC is 
complicated here, in order to get accurate results when the domain ASd is in- 
stantiated to the domain Pattern. The implementation works on enhanced sets of 
abstract sequences which allow us to keep individual structural information about 
the results of every clause in order to detect mutual exclusion of the clauses. 
Let us motivate the lifting of abstract sequences to enhanced abstract sequences. 



Lifting an abstract domain to its power set, see, for instance, (Cousot and Cousot, 



1979; File and Ranzato, 1994), is sometimes useful when the original abstract do- 



main is not expressive enough to gain a given level of accuracy. Replacing an ab- 



stract domain by its power set is computationally expensive however; see (Van Hen 
tenryck et al., 1993| ). Sometimes, the accuracy is lost only inside a few operations; 
thus, a good compromise can be to lift the domain only locally, when these op- 
erations are executed, and to go back to the simple domain afterwards. This is 
exactly what we are going to do for the operation CONC. The lifted version of the 
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abstract domain that we are about to define is useful when the abstract domain 
is able to express definite, but not disjunctive, structural information about terms. 
In such a domain, for instance, the principal functor of the term bound to a pro- 
gram variable can be either definitely known or not known at all; it is not possible 
to express that it belongs to a given finite set. The domain Pattern used in our 
experiments is an abstract domain of this kind. Disjunctive structural information 
is however essential to implement the operation CONC accurately: it allows us to de- 
tect mutually exclusive abstract sequences, i.e., abstract sequences that should not 
be "abstractly concatenated" since they correspond to different concrete inputs. In 
order to keep disjunctive structural information, our implementation of CONC works 
on a finite set of abstract sequences. This set is "normalized" in some way, in or- 
der to simplify the case analysis in the implementation. Basically, we differentiate 
between "surely empty" abstract sequences, approximating only sequences of the 
form <> or < _L >, and "surely non empty" abstract sequences, approximating 
only sequences of the form < 9 >:: S. This is useful because sequences such as <> 
or < _L > are possible outputs for any input, while sequences of the form < 8 >:: S 
are only possible for some inputs. Therefore we only have to check incompatibility 
of "surely non empty" abstract sequences. This discussion motivates the following 
definitions of semi-simple abstract sequences and simple abstract sequences. 

Definition 5.8 {Semi-Simple Abstract Sequences) 

Let B G ASSd- We say that B is a semi-simple abstract sequence if 

1. either, (3 = (3$ and m — M = 

2. or, (i ^ 0$ and 1 < m < M. 

Definition 5.9 {Simple Abstract Sequences) 

Let B G ASSd- We say that B is a simple abstract sequence if it is semi-simple 
and t G {snt, st}. 

Semi-simple abstract sequences formalize our idea of distinguishing between "surely 
empty" and "surely non empty" abstract sequences. Note that, assuming that 
is the only abstract substitution such that Cc{f3$) = 0, we have that Cc(B) ^ for 
any semi-simple abstract sequence B. 

Definition 5.10 (Enhanced Abstract Sequences) 

Let D be a finite set of program variables. We denote by ASSf?* 1 the set of all sets 
of the form {B\, . . . , B n }, where n > and B\, . . . , B n are semi-simple abstract 
sequences from ASS d- Elements of ASS™ are called enhanced abstract sequences; 
they are denoted by SB in the following. The concretization function Cc : ASS™ — > 
CSS D is defined by Cc(SB) = \JseSB Cc{B). 

The operation SPLIT1 transforms an arbitrary abstract sequence into an equiv- 
alent enhanced abstract sequence. 

Operation SPLIT1 : ASS D -> ASS e ^ h 

This operation is required to satisfy the property that for every B G ASSd, 
Cc(SPLIT1(B)) = Cc{B). Let B = (0,m,M,t). We define SB' = SPLITl(B) as 
SB' = SB 1 U SB 2 where 
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SB! = {(/3 ,O,O,t)} ifm = 

= otherwise 

SB 2 = {{/3,max(l, m),M,t}} if j3 ^ /3 and max(l, m) < M 

— otherwise. 



The operation MERGE is the converse of SPLIT1: it transforms an enhanced ab- 
stract sequence into a plain abstract sequence. Most of the time, this operation loses 
part of the information expressed by the enhanced abstract substitution sequence; 
but it does not lose any information when the enhanced abstract sequence results 
from a single application of SPLIT1. 

Operation MERGE : ASS e A lh -> ASS D 

The operation MERGE satisfies the following properties: 

1. For every SB G ASS^ 11 , Cc(SB) C Cc(MFRGE(SB)) 

2. For every B G ASS D , C*c(MERGE(SPLITl(B))) = Cc(B). 

The definition of MERGE requires choosing a particular abstract sequence B% such 
that Cc(Bd,) = 0. We decide that B® = (/3@, 1,0, st). This choice is arbitrary since 
there is no best (least) representation of the empty set of abstract sequences in this 
domain. Moreover, it uses the binary operation UNION : (AS d x ASd) —* ASd, 
which is inherited from our previous framework. The latter is extended to finite 
sequences of abstract substitutions as follows: 

UNI0N(<>) = /? 

UNI0N(< (3 >) = /?, for every /3 G AS o 

UNI0N(< ft, ...,/?„ >) = UNI0N(/3i,UNI0N(< (3 2y ...,(3 n >)), 

for all /?!,...,/?„ € ASd (n > 2). 

The operation MERGE can now be defined. Let U denote the least upper bound 
on 77. Let SB e ASS% h such that SB = {B u . . . ,B n } and B t = {j3 u m u M h U) 
(1 < i < n). The abstract sequence B' = MERGE(5'i?) is such that 

B' = B 9 ifn = 

= Bi if n = 1 
= <UNI0N(< /3i,...,/3 n >),min(mi, . . .,m„), 

max(Mi, . . . ,M„),ti U . . . Ut n ) if n > 2. 

The notion of simple abstract sequence with cut information is also useful to 
simplify the case analysis in the implementation of CONC. 

Definition 5.11 (Simple Abstract Sequences with Cut Information) 

Let B e ASSd and acf G ACF. The abstract sequence with cut information 

(B, acf) is said to be simple if B is simple and acf G CF. 

The operation SPLIT2 converts an arbitrary abstract sequence with cut informa- 
tion into an equivalent set of simple abstract sequences with cut information. 

Operation SPLIT2 : ASSC D -> p(ASSC D ) 

The operation SPLIT2 satisfies the following properties. For every C G ASSCd, 



1- UceSPLiT2(c) Cc(C) — Cc(C); 
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2. all abstract sequences with cut information in SPLIT2(C) are simple. 

Its definition is simple. We first apply the operation SPLIT1 to the abstract sequence 
part of C. Then we split the cut information. Finally we split the termination 
information. Formally, SPLIT2(C) is defined as follows. 

1. Let C = (B, acf) G ASSC D . We define 

SPLIT2(C) = Ub' 6 SPLIT1(B) SPLIT2«B', acf)). 

2. Let B = (f3,m, M,t) G ASSd- Assume that B is semi-simple. We define 

SPLIT2((B, weakcut)) = SPLIT2({5, nocut)) U SPLIT2({B, cut)) ifra = 
= SPLIT2({B, cut)) ifm>l. 



(Remember that, by Definition 5.5, we also have (3 = (3$ and M = 0, in the 
first case, and f3 ^ [3$ and m < M, in the second case.) 
3. Let B — ((3,m, M,t) G ASSd and cf G CF. Assume that B is semi-simple. 
We define 

SPLIT2((B, cf)) = {{B,cf)} if t € {snt, st}; 

= {{([3,m,M,snt),cf),(((3,m,M,st),cf)} if t = pt. 

Before presenting the implementation of CONC, we still need to specify the opera- 
tion EXCLUSIVE, which is aimed at detecting incompatible outputs. An implemen- 



tation of this operation for the domain Pattern is given in Section 5.3 



Operation EXCLUSIVE : (AS D x AS D x AS D ) -> Bool 

The operation EXCLUSIVE satisfies the following property. For all f3,f3i,f3 2 € ASd, 

EXCLUSIVE^, ft./Sa) -,(38 G Cc(j3), 6 X G Cc(/3i), 6 2 G Cc(/3 2 ), <r u a 2 G SS : 

Bo \ = 6*i and 9a 2 = 2 ). 

We are now ready to describe the operation CONC. 
Operation CONC : (AS D x ASSC D x ASSj? 1 ) -» ASS£?\ 

Let /3 G ylSz,, Ci G ASSCd and SBa G ASS* 1 ' 1 . S*5' = C0NC(/3, d,SB 2 ) is defined 
as follows. We assume that Bi = m,, Mi, U). 

1. Let us assume first that Ci = (£?i, ac/ x ) is simple and SBa = {B 2 }- 

(a) Suppose that acf \ — cut or t\ — snt. In this case, we define 

SB' = {Bi}. 

(b) Suppose, on the contrary, that acf \ — nocut and t\ = st. We define 

SB' = {B 2 } if Mi = 

= {{/3 1 ,m 1 ,M 1 ,t 2 )} if Mi > 1 and M 2 = 

= {{UMI0N(/3i,/3 2 ),mi + m 2 ,Mi +M 2 ,f 2 >} if Mi > 1 and M 2 > 1 

and -.EXCLUSIVE(/3,/3i,/3 2 ) 
= if Mi > 1 and M 2 > 1 

and EXCLUSIVE(/3,/?i,/3 2 ). 

2. In the general case, we define 

SB' = (J C0NC(/3,C*,{B}). 

CeSPLIT2(Ci) 
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5.3 Instantiation to Pattern 



The domain of abstract substitutions Pattern has been introduced in (Musumbu 



1990) and it has been used in many of our previous works, e.g., (Englebert et 
Lc Charlier and Van Hentcnryck, 1995). The reader is referred to (|Le 



al, 1993 



Charlier and Van Hentcnryck, 1994) for a detailed description of the domain and 



of its abstract operations. 

The Abstract Domain Pattern. The version of Pattern used in the experimen- 



tal evaluation of Section 5.4 can be best viewed as an instantiation of the generic 



pattern domain Pat (1Z) ( Cortesi et al., 1994 ; Cortesi et al., 206"(i|) with mode, sha- 



ring, and arithmetic components. 

The key intuition behind Pat (1Z) is to represent information on some subterms 
occurring in a substitution instead of information on terms bound to variables only. 
More precisely, Pat (7Z) may associate the following information with each conside- 
red subterm: (1) its pattern, which specifies the main functor of the subterm (if any) 
and the subterms which are its arguments; its properties, which are left unspecified 
and are given in the domain 1Z. In addition to the above information, each variable 
in the domain of the substitution is associated with one of the subterms. It can be 
expressed that two arguments have the same value (and hence that two variables are 
bound together) by associating both arguments with the same subterm. It should 
be emphasized that the pattern information may be void. In theory, information 
on all subterms could be kept but the requirement for a finite analysis makes this 
impossible for almost all applications. As a consequence, the domain shares some 



features with the depth-k abstraction (Kanamori and Kawamura, 1987), although 



Pat (7Z) does not impose a fixed depth but adjusts it dynamically through upper 
bound and widening operations. Note that the identification of subterms (and hence 
the link between the structural components and the 7?.-domain) is a somewhat 
arbitrary choice. In Pat (1Z) , subterms are identified by integer indices, say 1, . . . , n 
if n subterms are considered, and we denote sets of indices by the symbol I. 

More formally, the pattern and same-value component can be described as fol- 
lows. The pattern component is a partial function frm : I /> Pati, from the set 
of indices / to the set of patterns over /, i.e., elements of the form f(ii, . . . , i n ), 
where / €E T is a functor symbol of arity n and i\, . . . ,i n £ I . When the pattern is 
undefined for an index i, we write frm(i) = undef. The same-value component is a 
total function sv : D —> I, where D — {x±, . . . ,x n } is the domain of the abstract 
substitution. 

A pattern component frm : I ■/> Pali denotes a set of families (tj)jgj of terms 
as defined below. 

Cc(frm) = {(ti)» e / | frm(i) = f{ii, . . . , in) => U = /fe , . . . ,tj„), 
Vi,ii, . . . ,i n 6 I,Vf 6 T}. 
In order to simulate unification with occur-check, we also assume that every pattern 
component frm satisfies the following condition: the relation >~C. I x I such that 
i >- j if and only if frm(i) is of the form /(..., j, ■ ■ .) must be well-founded. 

A pair (sv,frm) with sv : D —> I and frm : I Pati is called structural abstract 
substitution; it denotes a set of program substitutions as follows: 
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Cc((sv,frm}) = {(9 G PS D | 3(*<)«=i G Cc(frm) : x 3 6 = t 3v(xj) , \Jx 3 G D}. 

The 7?.-domain is the generic part which specifies subterm information by descri- 
bing properties of a set of tuples < t±, . . . ,t n > where ti, . . . , t n are terms. As a 
consequence, defining the 7?.-domain amounts essentially to defining a traditional 
domain on substitutions and its operations. We now describe the various compo- 
nents of the 7?.-domain which can be built as an open product (Cortesi et al, 1994; 
|Cortesi et at, 2000| ). 

The mode component is described in ( Le Charlier and Van Hentenryck, 1994| ) and 
associates a mode from the set Modes — {var, ground, novar, noground, ngv, gv, any} 
with each subterm. Formally, it is a total function mo : I — > Modes whose con- 
cretization is defined as 

Cc(mo) = {(tj)*el I U £ Cc(mo(i)), Vi £ I}. 

The sharing component maintains information about possible sharing between 
pairs of subterms and is also described in ( Le Charlier and Van Hentenryck, 1994 ). 
Formally, it is a symmetrical relation ps C / x I whose concretization is defined as 

Cc(ps) = {(U)iei | var(U) n var(tj) => ps(i,j), Vi, j £ /}. 



The arithmetic component is novel and aims at using arithmetic predicates to 
detect mutual exclusion between clauses. It approximates information about arith- 
metic relationships by rational order constraints, i.e., binary constraints of the form 
id j and unary constraints of the form i 8 c, where i,j are indices, 8 €{>,>,=, < 
, <} and c is an integer constant. For instance, a built-in X > Y + 2 is approximated 
by a constraint X > Y. Formally, an element arithm is a set of rational order con- 
straints over indices, whose concretization is defined as follows (a constraint being 
satisfied only if the terms are numbers). 

Cc(arithm) = {(tj)ie/ | V i 5 j £ arithm : U 6 tj and V i 5 c £ arithm : ti 6 c}. 



The Operation EXCLUSIVE. We describe here the implementation of the ope- 
ration EXCLUSIVE on our domain of abstract substitutions. This operation was not 
present in our previous works. It aims at detecting situations where two output 
abstract sequences B\ and Bi are incompatible, given that they both originate from 
the same abstract input substitution [3. Only the abstract substitution components 
j3\ and fa of B\ and Bi are useful to detect such situations. Thus the operation 



EXCLUSIVE has three arguments (3, fa, and fa. (See its specification in Section 5.2.) 

Let us first introduce the notion of decomposition of a program substitution 
with respect to a structural abstract substitution. It represents the family of terms, 
occurring in the program substitution, that are given an index by the structural 
abstract substitution. 

Definition 5.12 {Decomposition of a Program Substitution) 

Let (sv,frm) be a structural abstract substitution over domain D = {x\, . . . ,x n } 
and set of indices /. Let also 8 G Cc(sv,frm). The decomposition of 8 with respect 
to (sv,frm) is the (unique) family of terms (tj)ie/ such that 
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= {xi/t av (x 1 ),---,x n /t av ( Xn )} and (U) ie i £ Cc(frm). 

Existence and unicity of the family (ti) ie j can be proven by an induction argu- 
ment that uses the fact that the relation >- over I is well-founded. Unicity holds 
conditional to the fact that / does not contain any "useless" element, i.e., for eve- 
ry i e I, there exists a variable Xj £ D and a set of indices ii, . . . ,ik such that 
i\ = sv(xj), ii >-...>- ife, and ik = i. From now on we assume that this condition 
always holds. 

The next definition models a property of the structural abstract substitutions ob- 
tained by performing any number of abstract unification steps on another structural 
abstract substitution. 

Definition 5.13 {Instance of a Structural Abstract Substitution) 
Let (sv,frm) and (sv 1 ,frm') be two structural abstract substitutions over the same 
domain D = {x\, . . . ,x n } and respective sets of indices I and I'. Let also im : I — > I' 
be a total function. We say that (sv 1 ,frm) is an instance of (sv,frm) with respect 
to im if the following conditions hold: 

1. sv' = im o sv; 

2. for alH, i\, . . . , i m G /, 

frm(i)= f(ii,...,i m ) => frm'(im(i)) = f(im(h), . . . ,im(i m )). 

Moreover, we say that (sv',frm') is an instance of (sv,frm) if there exists a function 
im such that the conditions hold. 

The next property holds. 

Property 5.14 

Let (sv,frm) and (sv',frm r ) be two structural abstract substitutions, and let im : 
I —> I' be such that (sv',frm) is an instance of (sv,frm) with respect to im. 
Let also 9 e Cc(sv,frm), 9' e Cc(sv' ,frm'), and a e SS. Finally, let (U)i e i and 
{t'i)iei' De the decompositions of 9 and 9' with respect to (sv,frm) and (sv',frm'}, 
respectively. Then we have 

6' = 6a => (ticr)iei = (t' im (i))iei- 

The proof is a simple induction on the well-founded relation >-, induced on / by frm. 

The next definitions and properties are instrumental to the implementation and 
correctness proof of the operation EXCLUSIVE. 

Definition 5.15 (Exclusive Pair of Indices) 

Let frm 1 and frm 2 be two pattern components over sets of indices / and J, respec- 
tively. Let also i € I and j G J. 

1. We say that (i,j) is directly exclusive with respect to (frm 1 ,frm 2 ) iff /rm 1 (i) = 
f(h, i p ), frm 2 (j) = g(ji, . . . ,j q ) and either / ^ g or p ^ q. 

2. We say that (i,j) is exclusive with respect to (frm 1 ,frm 2 ) iff is directly 
exclusive with respect to (frm 11 frm 2 ) , or /rm 1 (i) = f(ii, ■ ■ ■ ,i p ), frm 2 (j) = 
f(ji, . . . ,j p ) and there exists k : 1 < k < p such that (ih,jk) is exclusive with 
respect to (frm 1 ,frm 2 ). 
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Property 5.16 

Let frm 1 and frm 2 be two pattern components over sets of indices I and J, respec- 
tively. Let (ti)i<=i G Cc{frm l ) and (tj)j^j G C'c(frm 2 ). Let also i e / and j G J. 

1. If the pair {i,j} is directly exclusive with respect to (frm 1 ,jrm 2 ) 1 then the 
terms U and are compound and they have distinct principal functors. 

2. If the pair {i,j} is exclusive with respect to (frm ll frm 2 ) , then the terms ^ 
and tj are distinct (ij 7^ tj). 

We are now in position to provide the implementation of the operation EXCLUSIVE 
for the domain Pattern. We just show here a partial implementation which only 
uses the pattern, same- value, and mode components but it gives the idea behind the 



complete implementation. For additional details, the reader is referred to (Braem 
land Modard, 1994|) . 



Operation EXCLUSIVE : Pattern x Pattern x Pattern — * Bool 
Let /?, (3i , 02 be abstract substitutions over the same domain D and sets of indices 
/, I\, and I2, respectively. Assume that (svi, frm^ and (sv2,frm 2 ) are instances of 
(sv,frm) with respect to im\ and im 2 , respectively. The value of EXCLUSIVE(/3, /3\, P2) 
is true if and only if there exists i £ I such that 

1. mo(i) G {ngv, novar} and the pair (im\{i), im,2(i)) is directly exclusive with 
respect to (frm 1 ,frm 2 ), or 

2. mo(i) = ground and the pair (imi(i), ini2{i)) is exclusive with respect to 
(/rm 1 ,/rm 2 ). 

Correctness of the implementation follows from Properties 5.14 and 5.16 ; see ( [LI) 
ICharlier et at, 1997|) . 



Prolog's Built-in Predicates. Prolog's built-in predicates such as test predicates 
(var, ground, and the like) or arithmetic predicates (is, <, . . . ) can be handled in 
essentially the same way as abstract unification. Our implementation actually in- 



cludes abstract operations that deal with test and arithmetic predicates ( Braem and 



Modard, 1994 ). Other built-in predicates can be accommodated as well, including 
the predicates assert and retract. However, the treatment of the latter predicates 
assumes that dynamic predicates are disjoint from static predicates, i.e., it assumes 
that the underlying program P is not modified. A more satisfactory treatment of 
dynamic predicates requires to introduce a new abstract object representing the 
dynamic program; this improvement is a topic for further work. 



5.4 Experimental Evaluation 

The experimental results presented in this section provide evidence of the fact that 
the approach presented in this paper allows one to integrate predicate level analysis 
to existing variable level analysis at a reasonable implementation cost. Comparisons 
with other cardinality and determinacy analyses can be found in Section 0. 
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Table 1 . Efficiency of the Cardinality Analysis 





OR 


PC 


PCA 


Programs 


I 


T 


I 


T 


IR 


TR 


I 


T 


IR 


TR 


Qsort 


13 


0.08 


17 


0.12 


1.31 


1.50 


13 


0.08 


1.00 


1.00 


Qsort2 


15 


0.08 


19 


0.12 


1.27 


1.50 


15 


0.09 


1.00 


1.13 


Queens 


15 


0.07 


18 


0.08 


1.20 


1.14 


18 


0.10 


1.20 


1.43 


Pressl 


532 


11.77 


581 


13.11 


1.09 


1.11 


581 


13.45 


1.09 


1.14 


Press2 


197 


3.27 


200 


3.56 


1.02 


1.09 


200 


3.56 


1.02 


1.09 


Gabriel 


78 


0.90 


84 


1.00 


1.08 


1.11 


84 


0.98 


1.08 


1.09 


Peep 


132 


3.21 


131 


18.85 


0.99 


5.87 


131 


19.08 


0.99 


5.94 


Read 


432 


23.91 


458 


25.32 


1.06 


1.06 


458 


25.37 


1.06 


1.06 


Kalah 


115 


1.90 


121 


2.09 


1.05 


1.10 


120 


2.11 


1.04 


1.11 


Cs 


79 


2.19 


91 


3.05 


1.15 


1.39 


90 


3.02 


1.14 


1.38 


Plan 


36 


0.21 


38 


0.30 


1.06 


1.43 


38 


0.27 


1.06 


1.29 


Disj 


64 


1.95 


68 


2.14 


1.06 


1.10 


68 


2.12 


1.06 


1.09 


Pg 


38 


0.32 


40 


0.36 


1.05 


1.13 


39 


0.35 


1.03 


1.09 


Boyer 


56 


0.76 


56 


1.15 


1.00 


1.51 


56 


1.17 


1.00 


1.54 


Credit 


63 


0.57 


64 


0.81 


1.02 


1.42 


64 


0.80 


1.02 


1.40 


Mean 










1.09 


1.56 






1.05 


1.52 



Benchmarks. Our experiments use our traditional benchmarks except that cuts 
have been reinserted as in the original versions. In addition, some new programs 
have been added. Boyer is a theorem-prover from the DEC-10 benchmarks, Credit 



is an expert system from (Sterling and Shapiro, 1986). There are two versions of 



Qsort which differ in procedure Partition which uses or does not use auxiliary 
predicates for the arithmetic built-ins. All the benchmarks are available by anony- 
mous ftp from ftp: / /ftp. info. fundp.ac.bc/pub/uscrs/blc/bench.p[ They have been 



run on a SUN SS-10/20. 

Efficiency. The efficiency results are reported in Table [l| Several algorithms are 
compared: OR is the original GAIA algorithm on Pattern ( |Le Charlier and Van 



Hentenryck, 1994), PC is the cardinality analysis with Pattern and PCA is PC with 
the abstraction for arithmetic predicates. I, T, IR and TR are the number of ite- 
rations, the execution time (in seconds), the iteration's ratio and the time's ratio 
respectively. The first interesting point to notice is the slight increase (about 5% on 
PCA) in iterations when moving from abstract substitutions to abstract sequences, 
showing the effectiveness of our widening operator. Even more important perhaps 
is the fact that the time overhead of the cardinality analysis is small with respect to 
the traditional analysis: PCA is 1.52 slower than OR. Note that in fact most programs 
enjoys an even smaller overhead but Peep is about 6 times slower than OR in PCA. 
This comes from many procedures with many clauses, most of which being not 
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Table 2. Accuracy of the Cardinality Analysis 









P 


C 


PC 


PCA 


Programs 


Query 


NP 


D 


%D 


D 


%D 


D 


%D 


D 


%D 


Qsort 


qsort (g,v) 


3 




















3 


100 


Qsort2 


qsort (g,v) 


5 


2 


40 


2 


40 


2 


40 


5 


100 


Queens 


queens (g, v) 


5 


2 


40 








2 


40 


2 


40 


Pressl 


test_press(v,v) 


47 


8 


17 


19 


40 


19 


40 


19 


40 


Press2 


test_press(v,v) 


47 


12 


26 


19 


40 


28 


60 


28 


60 


Gabriel 


main(v, v) 


17 








4 


24 


4 


24 


4 


24 


Peep 


comppeeppopt (g , v , g) 


24 


4 


17 


7 


29 


16 


67 


16 


67 


Read 


read(v, v) 


46 


11 


24 


27 


59 


31 


67 


31 


67 


Kalah 


play(v, v) 


46 


16 


35 


20 


43 


33 


72 


40 


87 


Cs 


pgenconf ig(v) 


32 


11 


34 


7 


22 


11 


34 


13 


41 


Plan 


transf orm(g , g , v) 


13 


1 


8 








1 


8 


1 


8 


Disj 


top(v) 


28 


13 


46 


11 


39 


13 


46 


13 


46 


Pg 


pdsbm(g, v) 


10 


2 


20 


3 


30 


5 


50 


6 


50 


Boyer 


boyer (g) 


24 








20 


83 


20 


83 


20 


83 


Credit 


credit (a, a) 


26 


14 


58 


11 


42 


14 


54 


16 


62 


Mean 








24 




33 




46 




58 



surely cut; much time is spent in the concatenation operation. Finally, note that 
adding more functionality in the domain did not slow down the analysis by much. 

Accuracy. The accuracy results are reported in Table |^. For each program we 
specify the initial query to which the abstract interpretation algorithm is applied 
(we denote by a, g and v the modes any, ground and var, respectively). Several 
versions of the algorithm are compared with respect to their ability to detect de- 
terminacy of procedures, which was our primary motivation. P is using only the 
domain Pattern (i.e., cuts are ignored), C is only using the cut (i.e., EXCLUSIVE al- 
ways returns false), and PC, PCA are defined as previously. In the table, NP stands 
for the number of procedures and D and %D denote the number of procedures and 
the percentage of procedures, respectively, that are detected to be deterministic by 
the algorithms. There are several interesting points to notice. First, PCA detects 
that 58% of the procedures are deterministic, although many of these programs in 
fact use heavily the nondeterminism of Prolog. Most of the results are optimal and 
a nice example is the program Kalah. Second, the cut and input/output patterns 
are really complementary to improve the analysis. Input/output patterns alone give 
41% of the deterministic procedures (i.e., those detected by PCA), while the cut de- 
tects 57% of the deterministic procedures. The abstraction of arithmetic predicates 
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adds 21% of deterministic proceduresQ The main lesson here is that all components 
are of primary importance to obtain precise results. 



6 Retaled works on determinacy analysis 

Determinacy of logic programs in general and of Prolog programs in particular is an 
important research topic because determinate programs can be implemented more 
efficiently than non-determinate programs (often, much more efficiently). Several 
forms of determinacy have been identified, which lead to different kinds of optimiza- 
tions. In this section, we review a few interesting papers on determinacy analysis 
at the light of our novel framework for the abstract interpretation of Prolog. The 
benefit of this study is twofold: first, it sheds new light on these analyses in the 
context of abstract interpretation; second, it supports the claim that our proposal 
is appropriate to integrate most existing analyses into a single framework. 



6.1 Sahlin's Determinacy Analysis for Full Prolog 



The analysis proposed by D. Sahlin (1991) aims at detecting procedures of a (full) 



Prolog program that are determinate (i.e., they succeed at most once) or fully- 
determinate (i.e., they succeed exactly once). The analysis is developed in the con- 



text of the partial evaluator Mixtus (Sahlin, 1993) in order to detect situations 



where cuts can be "executed" or removed. Sahlin's analysis is not based on ab- 
stract interpretation; hence he provides a specific correctness proof for it. 



In this section, we show that the determinacy analysis proposed by Sahlin ( 1991 ) 
is indeed an instance of our framework over his abstract domain. 

Abstract Domains. Sahlin's analysis completely ignores information on program 
variables. The abstract domains are concerned with the sequence structure only: 
substitutions are completely ignored. Note that no abstract interpretation frame- 
work available at the time of his writing was adequate to his needs. 

Abstract Substitutions. Since program variables are ignored, we can assume a 
domain AS consisting of an arbitrary single element. 

Abstract Sequences. Sahlin's analysis can be formalized in our framework by 
defining ASS = p(AASS), where AASS = {£, 0, 1, 1', 2, 2'}|. We call elements of 
AASS, atomic abstract sequences. Their concretization is defined as follows: 
Cc{£) = {< _L >} 

cm = {<>} 

Cc(l) = {S £ PSS | Ns(S) = 1 and S is finite} 

Cc(l') = {S 6 PSS | Ns(S) = 1 and S is incomplete} 

Cc(2) = {S £ PSS | Ns(S) > 1 and S is finite} 

Cc(2') = {S £ PSS | Ns(S) > 1 and S is incomplete or infinite} 



3 Notice that 24/58=0.41, 33/58=0.57 and (58-46)/58=0.21. The inequality 41+57+21^100 can 
be understood by the fact that the analysis computed by P, C and A (the latter being the 
algorithm that only considers the arithmetic predicates) are not comple tely exclusive . 

4 We choose to denote the elements of AASS by the same symbols as in (Sahlin, 1991). 
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The concretization function Cc : ASS — > p(PSS) is denned by: 
Cc(B) = \J beB Cc(b). 

The relation < on ASS is naturally defined as being set inclusion. The concretization 
function is thus clearly monotonic. 

Abstract Sequences with Cut Information. We define the set ASSC as being 
equal to p(AASS x CF). The elements of ASSC are denoted by £„, 0„, 1„, l' n , 2 n , 
2' nl C c , C , l c , 1^,, 2 C , 2^,, in ( ^ahlin, 1991 ), where the index n stands for nocut, while 



the index c stands for cut. The concretization function is defined in the obvious way. 

Extended Widening. In order to instantiate our generic abstract interpretation 
algorithm to the above domains, it remains to provide an implementation of the 
various abstract operations. This can be done systematically from the specifications 
of the operations and the domain definitions; we leave it as an exercise to the 
reader, except for the extended widening, whose implementation is not obvious. 
The basic intuition behind the extended widening is that it should "observe" how 
the abstract sequences evolve between the consecutive iterations in order to ensure 
convergence when enough accuracy seems to be attained. In this abstract domain, 
the abstract sequence Bi produced at step i may intuitively differ from by the 
fact that some "incomplete" elements (i.e., C, 1', 2') can be removed and replaced 
by more "complete" ones. Of course the computation starts with Bq = {£}. Thus 
the algorithm waits until "enough incomplete elements have been removed" and 
then accumulates the next iteration results to enforce termination. This can be 
formalized by defining a pre-order C over ASS such that B\ C B^ holds when Bi 
only contains elements that are "more complete" than some elements of B\ and 
when, conversely, B\ only contains elements that are "less complete" than some 
elements of Bi- We first define the relation is strictly less complete than between 
atomic abstract sequences by the table: 

£ C £ C 1 C C 1' £ C 2 £ C 2' l'cl l'c2 l'c2' 2'c2. 

Then, for all atomic abstract sequences b\ and 62, we say that b\ is less complete 
than 62, denoted by b\ C 62, if 61 = &2 or b\ C 62. This relation is lifted to general 
abstract sequences as follows: 

Definition 6.1 {Computational Pre- Ordering) 
Let B 1 ,B 2 £ ASS. By definition, 

B 1 \ZB 2 iff (V61 € Bi, 36 2 € B 2 such that 61 C 62) and 
(V62 G B 2 , Bbi £ Si such that 61 C 62). 

We write Bi IZ B2 to denote the condition (Bi C Bi and -B2 
We are now in position to define the extended widening. 

Definition 6.2 {Extended Widening for Sahlin's Domain: B' = B new V B a id) 
B' = B 

new if B M C B 

new ; 

= B new U Bow otherwise. 
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In fact, the above operation does not fulfill, strictly speaking, the requirements 
for being an extended widening. It works however if we have B a id Q B new each time 
it is applied. This is normally the case if the other abstract operations are carefully 
implemented, since each iteration of the abstract interpretation algorithm should 
replace every element in B a id by one or several more complete elements. Before 
stating what is it actually achieved by the operation V, we need two definitions. 

Definition 6.3 (Equivalent Abstract Sequences) 
Let Bi,B 2 e ASS. By definition, 

Bi^B 2 iff Bi C B 2 and B 2 C B x . 

The relation « is an equivalence because C is a pre-order. It can be shown that 
« determines 42 equivalence classes, of which 28 are a singleton (e.g., {{£, 0, 1'}}), 
10 have 2 elements (e.g., {{£, 0, 2'}, {£, 0, 1', 2'}}), and 4 have 4 elements (e.g., 
{{£, 0, 2}, {£, 0, 2, 2'}, {£, 0, 1', 2}, {£, 0, 1', 2, 2'}}). It is also important to note that 
distinct equivalent abstract sequences always have different concretizations. 

Definition 6.4 (Strengthened Computational Ordering) 
Let Bi,B 2 e ASS. By definition, 

Bi<B 2 iff B x C B 2 or (Bi w B 2 and C S 2 ). 

The relation < is an order; every ascending sequence Z?i < < . . . < -Bj . . . is 
stationary since ASS is finite. 

Property 6.5 (Conditional Convergence of the Extended Widening) 

Let {BijieN and {B'A-i^ be two sequences of elements of ASS such that 

1. B[ C for all i E N; 

2. = Bi + iVB[, for all i € N. 

Then we have Bi < B[, for all i € N*, and the sequence {i?,-}i 6 N is stationary. 
Proof 

The fact that Bi < _B,-, for all i S N*, is a direct consequence of the definition of 
the operation V. Moreover, the hypotheses on the sequences ensure that B[<B' 2 ^ 
. . . < B[ . . . ; thus the sequence {£? 2 '}i e N is stationary. □ 



If all abstract operations are congruent with respect to C f^J each iteration of the 
abstract interpretation algorithm ensures that B id C B new , where B a id is the cur- 
rent value in sat and B new is the newly computed abstract sequence. Thus, Proper- 
ty |6.5| guarantees termination of the abstract interpretation algorithm. Congruence 
of the abstract operations with respect to C is ensured if they are "as accurate 
as possible" (which is achieved in ( Sahlin, 1991 )); however, proving this property 
entails a lot of work. A simpler solution consists of testing whether B a id Q B new 
actually holds before each application of the extending widening. If the condition 



We would have written monotonic if the relation C was an order, not a pre-order only. 



52 



B. Le Charlier, S. Rossi and P. Van Hentenryck 



does not hold, we switch to a cruder form of widening, which simply merges all 
successive results. 

Comparison with our Cardinality Analysis. The determinacy information 
inferred by means of Sahlin's domain is in general less accurate than our cardinality 
analysis (except maybe in some partial evaluation contexts). For instance, with the 
former domain, it is not possible to detect mutually exclusive clauses except when 



cuts occur in the clauses. As illustrated in Section 5.3, the information provided 
by the abstract substitution component of our domain is instrumental to detect 
sure failure, sure success, and mutual exclusion, which all contribute to improve 
the accuracy of the determinacy (or cardinality) analysis. Nevertheless, the specific 
information about the sequence structure is finer grained in Sahlin's domain than in 
ours. Consider the abstract sequence {£, 1}; it is approximated, in our domain, by 
(0, l,pt), which is actually equivalent to {£,0, 1, 1'}. Thus, it could be interesting 
to design a domain for abstract sequences similar to our cardinality domain, where 
the sequence component coincides with Sahlin's domain. 

6.2 Giacobazzi and Ricci's Analysis of Determinate Computations 



The work of Giacobazzi and Ricci ( 1992 ), is also worth being reviewed in our 



context. They propose an analysis of functional dependencies ( Mendclzon, 1991 ) 
between procedure arguments of the success set of pure logic programs. Their ana- 
lysis is a bottom- up abstract interpretation, based on ( Barbuti et al., 1993| ; Falaschi 



et al, 1989D . The analysis also infers groundness information and is intended to be 



used for parallel logic program optimization. In our comparison, we focus on the 
functional dependencies and we simplify the presentation in order to concentrate on 
the salient points. First, we provide a definition of functional dependency tailored 



to our framework. The definitions use some notions from Section 5.3 



Definition 6.6 {Functional Dependency) 

Let (sv,frm) be a structural abstract substitution over domain D and set of indices 
/. A functional dependency for (sv,frm), denoted by J j, is a pair consisting of 
a subset J of I and an index j 6 /. 

Let S 6 PSSd be a program substitution sequence such that Subst(S) C Cc(sv,frm). 
We say that the functional dependency J — > j holds in S for (sv,frm), if for all 
families of terms (ii)ie/, (iQiel that are decompositions of some program substitu- 
tions of Subst(S), the following implication is true: 

(ti)i e j = (tj)iej =>■ tj = t'j. 

Then we define an abstract domain to express functional dependencies. 

Definition 6.7 (Abstract Sequences with Functional Dependencies) 
An abstract sequence with functional dependencies is a triple (sv,frm,fd) where 
(sv,frm) is a structural abstract substitution over domain D and set of indices /, 
and fd is a set of functional dependencies for (sv,frm). The concretization function 
for abstract sequences with functional dependencies is defined by 
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Cc{sv,frm,fd) 



s e PSSi 



Subst(S) C Cc({sv,frm)) and 
J — » j holds in S 1 for (sv,frm), 
for every J — > j G /rf. 



In fact, the functional dependency component /<i is best viewed as an additional 
component to the cardinality domain defined in Section |5|, since its usefulness for 
determinacy analysis depends on the availability of mode information. Let S £ 
CPSS d be a canonical program substitution sequence. We say that S is functional 
if the set Subst(S) is empty or is a singleton. Such sequences model the behavior of 
procedures that cannot produce two or more distinct solutions. Assume that S is 
the output sequence corresponding to the input substitution 9, for some procedure 
p. Assume that 9 £ Cc(sv,frm) and S £ Cc(sv' ,frm' , fa 1 ') where (sv',frm') is 
more instantiated than (sv,frm). We can infer that S is functional if there exists 
J C P such that fd' contains a functional dependency of the form J — > i, for every 
i £ sv'(D), and if every term tj corresponding to an index j £ J in a program 
substitution of S is not more instantiated than the corresponding term in 9. The 
latter information is easily deduced if we know, for instance, that tj is ground or 
is a variable. Thus adding a functional dependency component to our cardinality 
domain allows us to infer that output program substitution sequences are functional. 

It is important to point out that the new component fd expresses a property of 
program substitution sequences, not a property of (single) program substitutions. It 
is meaningless to use functional dependencies in a domain of abstract substitutions, 
because a set of functional dependencies determines a (two valued) condition on a 
set of program substitution. Either the set verifies the condition, then no constraint 
is added, or it does not and the set is rejected as a whole. Thus, a component fd 
defines a set of sets of program substitutions. As a consequence, functional depen- 
dencies cannot be handled by previous top-down abstract interpretation frameworks 



such as ( Bruynooghc, 1991 ; Lc Charlicr and Van Hcntcnryck, 1994; Marriott and 



1992 



Soindergaard, 1989a; Mellish, 1987| ; |Muthukumar and Hermenegildo, 1992| ; |Warren 



Winsborough, 1992). However the abstract interpretation framework used 



by flGiacobazzi and Ricci, 1992 ) is bottom-up and abstracts the success set of the 



program. The result of an analysis represents a set of possible success sets, i.e., a 
set of sets of output patterns, which is similar to a set of sets of program substi- 
tutions. As far as we know, it is the first time that this difference of expressivity 
between bottom-up and (previous) top-down abstract interpretation frameworks 
is pointed out in the literature. The comparison usually concentrates on the fact 
that bottom- up frameworks are goal independent, i.e., they provide information on 
the program as a whole, while top-down frameworks are goal dependent, i.e., they 
provide information about the program and a given initial goal. We believe that a 
more fundamental difference lies in the fact that top-down frameworks are func- 
tional, i.e., they abstract the behavior of a program by a function between sets of 
sets, while bottom- up frameworks are relational, i.e., they abstract the behavior 
of a program by a set of relations. The difference between the two approaches has 
been previously put forward by Cousot and Cousot ( 1992b| ), but not in the con- 



text of logic programs. The functional approach can easily focus on small parts 
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of the program behavior but looses the dependencies between inputs and outputs; 
the converse holds for the relational approach. Our novel framework is basically 
functional, but the domain of abstract sequences is in some sense relational; thus 
the framework allows us to combine the advantages of both approaches. 



6.3 Debray and Warren's Analysis of Functional Computations 

In the previous section, we have shown that functional dependencies are useful 
to infer that an output program substitution sequence is functional, i.e., does not 
contain two or more distinct program substitutions. Such a sequence may contain 
several occurrences of the same program substitution, however. The importance 
of functional computations for logic program optimization was advocated early by 



Debray and Warren (1989). In this paper, these authors propose a sophisticated al- 
gorithm to infer functional computations of a logic program. The analysis exploits 
functional dependencies and mode information, as well as a set of sufficient condi- 
tions to detect mutually exclusive clauses. Their algorithm is not based on abstract 
interpretation and assumes that functional dependencies and mode information are 
given from outside. Thus the algorithm considers an annotated program; it uses 
a set {_L, true, false} where _L is an initializing value, true means that a proce- 
dure is functional and false means that it is not known whether the procedure is 
functional. Hence, the set can be viewed as a domain of abstract sequences, with 
concretization function Cc : {_L, true, false} — > p(CPSS) defined by 

Cc{L) = {<±>}; 

Cc(true) = {S £ CPSS | Subst(S) is empty or is a singleton.}; 
Cc (false) = CPSS. 

All aspects of their analysis can be accommodated in our approach by providing 
suitable abstract domains. An abstract domain consisting of our cardinality domain 
augmented with a functional dependency component would probably be fairly ac- 
curate. Moreover, in our approach, all analyses can be performed at the same time 
and interact with each other, making it possible to get a better accuracy. 



7 Conclusion 

This paper has introduced a novel abstract interpretation framework, capturing the 
depth-first search strategy and the cut operation of Prolog. The framework is based 
on the notion of substitution sequences and the abstract semantics is defined as a 
pre-consistent post-fixpoint of the abstract transformation. Abstract interpretation 
algorithms need chain-closed domains and a special widening operator to com- 
pute the semantics. This approach overcomes some of the limitations of previous 
frameworks. In particular, it broadens the applicability of the abstract interpreta- 
tion approach to new analyses and can potentially improve the precision of existing 
analyses. On the practical side, in this paper, we have only shown that our approach 
allows one to integrate - efficiently and at a low conceptual cost - a predicate level 
analysis (i.e., determinacy analysis) to variable level analyses classically handled by 
abstract interpretation. However, the improvement on classical analyses is marginal 
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because, due to our design choices for the abstract sequence domain (i.e., a simple 
extension of Pattern), the new system behaves almost as the previous version of 
GAIA for variable level analyses. Nevertheless, the new framework opens a door for 
defining and exploiting more sophisticated domains for abstract sequences. 
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Appendix 



We complete here the description of the abstract operations started in Section 5.2 



The correctness proofs of all the abstract operations can be found in ( Le Charlicr 
et al., 1997). The definitions below have been added in order to allow the reader to 



check the details of the examples in Section 5.2. 



Extension at Clause Entry: EXTC(c, •) : AS D -> ASSC D > 

The implementation reuses the homonymous operation from the previous frame- 
work, which is specified as follows. 

Operation EXTC(c, •) : AS D — > AS D > 

Let (3 e AS D , 9 G CPS D , and & G PS D . such that x l 9' = x % 9 (Vi : 1 < i < n) and 
Xn+iO 1 , . . . , x m 0' are distinct standard variables not belonging to codom(9). Then 

e e Cc(fi) =► [0'] e Cc(extc(c,/?)). 

Hence, the EXTC operation on sequences is defined by 

EXTC(c,^) = ((EXTC(c,/3), 1, 1, st), nocut). 



Restriction at Clause Exit: RESTRC(c, •) : ASSC D > -> ASSC D 

The treatment of this operation is similar to the previous one. We first specify the 

abstract substitution version of the operation. 

Operation RESTRC(c, •) : AS D > — > AS D 
Let 0eAS D > and 6eCPS D >. We have 

9 e Cc(f3) => [0|_d] g Cc (RESTRC(c,/3)). 

Hence, the RESTRC operation on sequences is defined by 

RESTRC(c,C) = (BESTKC(c,0),m,M,acf). 



Restriction before a Call: RESTRG(/, •) : ASd> — > AS D '" 
This operation is simply inherited from the previous framework. 

Unification of a Variable and a Functor: UNIF-FUNC(/, •) : AS D -> ASS D 
The treatment of this operation is identical to the treatment of the UNIF-VAR ope- 
ration and is thus omitted. 



Extension of the Result of a Call: EXTGS(Z, •, •) : ASSC D , x ASS D "> -> ASSC D > 
This operation reuses the operation EXTG from the previous framework. The reused 
operation has to fulfill the specification just below. 

Operation EXTG(7, •, •) : AS D > x AS D >» -> AS D > 

Let Pi S A5c and /3 2 G /ISb"'. Let X G CP^c and 6» 2 G PS'd'" be such that 
x^di = Xj6* 2 (Vj : 1 < j < n'). Let a G 55 such that dom{a) C codom(9i). Let 
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{zi, . . . , z r } = codom(8i) \ codomid'i). Let yi, . . . ,y r be distinct standard variables 
not belonging to codom(6i)Ucodom(a). Let p — {z\/y\, . . . , z r /y ri yi/z\, . . . ,y r / z r }. 
Under these assumptions, 



0i € Cc(/3i), 
6 2 o £ Cc(/3 2 ) 



hpaj G C*c(EXTG(/,/3i,/3 2 )). 



The implementation of EXTGS is as follows. 



m' 
M' 
t' 

acf 



EXTG(/,/3i,/3 2 ); 
m\vn,2 

min(l, mi)m2 

min(l,Mi)M 2 

M1M2 

snt 

st 

pt 

acf. 



if t2 = st, 
otherwise; 
if t2 = snt, 
otherwise; 

if ti = snt or (t 2 = snt and mi > 1), 
if ti = si and (t 2 = si or Mi = 0), 
otherwise; 



Operation SEQ : ASSC D -> ^SSr, 
We define 

SEQ«B, oc/» = 5. 

Operation SUBST : ASSC D > -> 4Sr>/ 
We define 



SUBST(((/3,ra,M,i), ac/)) = /?. 



